Search | arXiv e-print repository

doi 10.1145/3637528.3671978

Expander Hierarchies for Normalized Cuts on Graphs

Authors: Kathrin Hanauer, Monika Henzinger, Robin Münk, Harald Räcke, Maximilian Vötsch

Abstract: Expander decompositions of graphs have significantly advanced the understanding of many classical graph problems and led to numerous fundamental theoretical results. However, their adoption in practice has been hindered due to their inherent intricacies and large hidden factors in their asymptotic running times. Here, we introduce the first practically efficient algorithm for computing expander de… ▽ More Expander decompositions of graphs have significantly advanced the understanding of many classical graph problems and led to numerous fundamental theoretical results. However, their adoption in practice has been hindered due to their inherent intricacies and large hidden factors in their asymptotic running times. Here, we introduce the first practically efficient algorithm for computing expander decompositions and their hierarchies and demonstrate its effectiveness and utility by incorporating it as the core component in a novel solver for the normalized cut graph clustering objective. Our extensive experiments on a variety of large graphs show that our expander-based algorithm outperforms state-of-the-art solvers for normalized cut with respect to solution quality by a large margin on a variety of graph classes such as citation, e-mail, and social networks or web graphs while remaining competitive in running time. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted to KDD'24, August 25-29, 2024, Barcelona, Spain

arXiv:2310.16752 [pdf, other]

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Authors: Moses Charikar, Monika Henzinger, Lunjia Hu, Maxmilian Vötsch, Erik Waingarten

Abstract: Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $Ω(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can b… ▽ More Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $Ω(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(\mathrm{nnz}(X) + n\log n)$ for arbitrary $k$. Here $\mathrm{nnz}(X)$ is the total number of non-zero entries in the input dataset $X$, which is upper bounded by $nd$ and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio $\smash{\widetilde{O}(k^4)}$ on any input dataset for the $k$-means objective. We also believe that our theoretical analysis is of independent interest, as we show that the approximation ratio of a $k$-means algorithm is approximately preserved under a class of projections and that $k$-means++ seeding can be implemented in expected $O(n \log n)$ time in one dimension. Finally, we show experimentally that our clustering algorithm gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 41 pages, 6 figures, to appear in NeurIPS 2023

arXiv:2306.01869 [pdf, other]

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

Authors: Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

Abstract: We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and… ▽ More We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2212.03016 [pdf, other]

Online Min-Max Paging

Authors: Ashish Chiplunkar, Monika Henzinger, Sagar Sudhir Kale, Maximilian Vötsch

Abstract: Motivated by fairness requirements in communication networks, we introduce a natural variant of the online paging problem, called \textit{min-max} paging, where the objective is to minimize the maximum number of faults on any page. While the classical paging problem, whose objective is to minimize the total number of faults, admits $k$-competitive deterministic and $O(\log k)$-competitive randomiz… ▽ More Motivated by fairness requirements in communication networks, we introduce a natural variant of the online paging problem, called \textit{min-max} paging, where the objective is to minimize the maximum number of faults on any page. While the classical paging problem, whose objective is to minimize the total number of faults, admits $k$-competitive deterministic and $O(\log k)$-competitive randomized algorithms, we show that min-max paging does not admit a $c(k)$-competitive algorithm for any function $c$. Specifically, we prove that the randomized competitive ratio of min-max paging is $Ω(\log(n))$ and its deterministic competitive ratio is $Ω(k\log(n)/\log(k))$, where $n$ is the total number of pages ever requested. We design a fractional algorithm for paging with a more general objective -- minimize the value of an $n$-variate differentiable convex function applied to the vector of the number of faults on each page. This gives an $O(\log(n)\log(k))$-competitive fractional algorithm for min-max paging. We show how to round such a fractional algorithm with at most a $k$ factor loss in the competitive ratio, resulting in a deterministic $O(k\log(n)\log(k))$-competitive algorithm for min-max paging. This matches our lower bound modulo a $\mathrm{poly}(\log(k))$ factor. We also give a randomized rounding algorithm that results in a $O(\log^2 n \log k)$-competitive algorithm. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 25 pages, 1 figure, to appear in SODA 2023

Showing 1–4 of 4 results for author: Vötsch, M