Clustering High Dimensional Dynamic Data Streams (Lin Yang)


We present a data streaming algorithms for the k-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1,2,…Δ}^d. Our algorithms use k/eps^2 * poly(d log Δ) space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of k centers the k-median cost of the coreset (1+eps)-approximates the cost of the streamed point set. We can use this coreset to compute a (1+eps)-approximation for the k-median problem by any efficient offline k-median algorithm. This is the first algorithm achieving space polynomial in d in this setting.


2018-08-09   14:00 ~ 15:00   


Lin Yang, Princeton university


Room 602,School of Information Management & Engineering, Shanghai University of Finance & Economics