Monte Carlo approximation certificates for k-means clustering

Mixon, Dustin G.; Villar, Soledad

Statistics > Machine Learning

arXiv:1710.00956 (stat)

[Submitted on 3 Oct 2017]

Title:Monte Carlo approximation certificates for k-means clustering

Authors:Dustin G. Mixon, Soledad Villar

View PDF

Abstract:Efficient algorithms for $k$-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect $k$-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for $k$-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of $k$-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the $k$-means objective, and being sub-linear, our algorithm is faster than $k$-means++ when the number of data points is large. We illustrate the performance of our algorithm with both numerical experiments and a performance guarantee: If the data points are drawn independently from any mixture of two Gaussians over $\mathbb{R}^m$ with identity covariance, then with probability $1-O(1/m)$, our $\operatorname{poly}(m)$-time algorithm produces a 3-approximation certificate with 99% confidence.

Comments:	8 pages
Subjects:	Machine Learning (stat.ML); Optimization and Control (math.OC)
Cite as:	arXiv:1710.00956 [stat.ML]
	(or arXiv:1710.00956v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1710.00956

Submission history

From: Soledad Villar [view email]
[v1] Tue, 3 Oct 2017 02:02:17 UTC (30 KB)

Statistics > Machine Learning

Title:Monte Carlo approximation certificates for k-means clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Monte Carlo approximation certificates for k-means clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators