Distributed matrix sketching (Zengfeng Huang)


A sketch or synopsis of a large dataset captures vital properties of the original data while typically occupying much less space. In this paper, we consider the problem of computing a sketch of a massive data matrix A which is distributed across a large number of servers. Our goal is to output a matrix B which is significantly smaller than but still approximates A well in terms of covariance error. We are mainly focused on minimizing the communication cost, which is arguably the most valuable resource in distributed computations. We show a gap between deterministic and randomized communication complexity for computing a covariance sketch. In PCA, the goal is to find a low-dimensional subspace that captures as much of the variance of a dataset as possible. Based on a well-known connection between covariance sketch and PCA, we give a new algorithm for distributed PCA with improved communication cost. Moreover, in our algorithms, each server only needs to make one pass over the data with limited working space.


2017-01-05   10:00 ~ 11:00   


Zengfeng Huang,University of New South Wales


Room 308,School of Information Management & Engineering, Shanghai University of Finance & Economics