Processing Massive Datasets via Efficient Data Representations (Peilin Zhong)


The computation on massive data has been the foundation of many important breakthroughs in machine learning and data mining. In this talk, I will show how to process massive data via efficient data representations.

As a concrete example, I will present and dive into a recent result (STOC'20) that develops a (1+\varepsilon)-approximate parallel algorithm for computing shortest paths in undirected graphs, using \poly(\log n) parallel time and m\poly(\log n) work/total space for n-nodes m-edges graphs. For (1+\varepsilon)-approximation, all prior algorithms with \poly(\log n) parallel time require at least \Omega(mn^{c}) work/total space for some constant c>0. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for more than 25 years. We developed several new tools of independent interest. One of them is an efficient representation of the input graph --- low hop emulator --- a \poly(\log n)-approximate emulator graph in which every shortest path has at most O(\log\log n) hops (edges).

In the talk, I will also give a brief overview of how to use efficient data representations to develop algorithms for other modern machine learning tasks such as training generative adversarial networks.


2021-3-1  10:00 ~ 11:00   


Peilin Zhong, Columbia University


Zoom ID:65928464685,Password:123456