Prof. Peter Golubtsov
Lomonosov Moscow State University;
National Research University Higher School of Economics
In big data problems, data are usually collected on many sites, have a huge volume, and new datasets are constantly emerging. It is often impossible to collect all the data required for a research project on one computer. Hence, many approaches are aimed at adapting classical data processing algorithms for a distributed computing environment. Ideally, such a modified algorithm should, working in parallel on many computers, extract some intermediate compact “information” from each set of raw data, gradually combine and update it, and finally use the accumulated information to obtain a result. When new data appear, it must extract information from them, add it to the accumulated one, and ultimately update the result. We will consider several examples of a suitable transformation of processing algorithms; discuss specific features of emerging forms of information representation, in particular, their algebraic properties; and see how the resulting algorithms fit MapReduce framework for parallel processing of huge amounts of data on large clusters. Besides, we will see how a certain formalization of the very notion of information and its algebraic properties can arise simply from adopting processing methods to big data demands.
Peter Golubtsov received the M.Sc. and Cand.Sc. degrees from the Lomonosov Moscow State University, Russia, in 1983 and 1988, respectively, and the Doctor of Science degree from the Institute for Information Transmission Problems of Russian Academy of Sciences in 1999. Currently, he is Professor at the Lomonosov Moscow State University and Professor at the National Research University Higher School of Economics, Moscow, Russia. Since 2004, Dr. Golubtsov has served as a member of several dissertation councils. Currently, he is a Member of the Dissertation Council MSU.05.01 of Lomonosov Moscow State University, Faculty of Mechanics and Mathematics. Specialty in the council: Theoretical foundations of informatics. His current research interests include problems of decision-making under uncertainty, image processing, information effects in games, etc. They mostly focus on various aspects of the concept of information and, in particular, on the algebraic properties and informativeness of various data sources. In his recent studies, he shows that the need for parallelization of data processing in Big Data problems leads to similar algebraic structures, which reflect basic properties of information and provide theoretical foundations for studying information processes in Big Data systems.