-
Expander Hierarchies for Normalized Cuts on Graphs
Authors:
Kathrin Hanauer,
Monika Henzinger,
Robin Münk,
Harald Räcke,
Maximilian Vötsch
Abstract:
Expander decompositions of graphs have significantly advanced the understanding of many classical graph problems and led to numerous fundamental theoretical results. However, their adoption in practice has been hindered due to their inherent intricacies and large hidden factors in their asymptotic running times. Here, we introduce the first practically efficient algorithm for computing expander de…
▽ More
Expander decompositions of graphs have significantly advanced the understanding of many classical graph problems and led to numerous fundamental theoretical results. However, their adoption in practice has been hindered due to their inherent intricacies and large hidden factors in their asymptotic running times. Here, we introduce the first practically efficient algorithm for computing expander decompositions and their hierarchies and demonstrate its effectiveness and utility by incorporating it as the core component in a novel solver for the normalized cut graph clustering objective.
Our extensive experiments on a variety of large graphs show that our expander-based algorithm outperforms state-of-the-art solvers for normalized cut with respect to solution quality by a large margin on a variety of graph classes such as citation, e-mail, and social networks or web graphs while remaining competitive in running time.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Simple, Scalable and Effective Clustering via One-Dimensional Projections
Authors:
Moses Charikar,
Monika Henzinger,
Lunjia Hu,
Maxmilian Vötsch,
Erik Waingarten
Abstract:
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $Ω(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can b…
▽ More
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $Ω(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(\mathrm{nnz}(X) + n\log n)$ for arbitrary $k$. Here $\mathrm{nnz}(X)$ is the total number of non-zero entries in the input dataset $X$, which is upper bounded by $nd$ and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio $\smash{\widetilde{O}(k^4)}$ on any input dataset for the $k$-means objective. We also believe that our theoretical analysis is of independent interest, as we show that the approximation ratio of a $k$-means algorithm is approximately preserved under a class of projections and that $k$-means++ seeding can be implemented in expected $O(n \log n)$ time in one dimension. Finally, we show experimentally that our clustering algorithm gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization
Authors:
Ameya Velingker,
Maximilian Vötsch,
David P. Woodruff,
Samson Zhou
Abstract:
We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and…
▽ More
We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Online Min-Max Paging
Authors:
Ashish Chiplunkar,
Monika Henzinger,
Sagar Sudhir Kale,
Maximilian Vötsch
Abstract:
Motivated by fairness requirements in communication networks, we introduce a natural variant of the online paging problem, called \textit{min-max} paging, where the objective is to minimize the maximum number of faults on any page. While the classical paging problem, whose objective is to minimize the total number of faults, admits $k$-competitive deterministic and $O(\log k)$-competitive randomiz…
▽ More
Motivated by fairness requirements in communication networks, we introduce a natural variant of the online paging problem, called \textit{min-max} paging, where the objective is to minimize the maximum number of faults on any page. While the classical paging problem, whose objective is to minimize the total number of faults, admits $k$-competitive deterministic and $O(\log k)$-competitive randomized algorithms, we show that min-max paging does not admit a $c(k)$-competitive algorithm for any function $c$. Specifically, we prove that the randomized competitive ratio of min-max paging is $Ω(\log(n))$ and its deterministic competitive ratio is $Ω(k\log(n)/\log(k))$, where $n$ is the total number of pages ever requested.
We design a fractional algorithm for paging with a more general objective -- minimize the value of an $n$-variate differentiable convex function applied to the vector of the number of faults on each page. This gives an $O(\log(n)\log(k))$-competitive fractional algorithm for min-max paging. We show how to round such a fractional algorithm with at most a $k$ factor loss in the competitive ratio, resulting in a deterministic $O(k\log(n)\log(k))$-competitive algorithm for min-max paging. This matches our lower bound modulo a $\mathrm{poly}(\log(k))$ factor. We also give a randomized rounding algorithm that results in a $O(\log^2 n \log k)$-competitive algorithm.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.