Skip to main content

Showing 1–50 of 74 results for author: Cohen-Addad, V

.
  1. arXiv:2406.09137  [pdf, other

    cs.DS cs.LG

    Dynamic Correlation Clustering in Sublinear Update Time

    Authors: Vincent Cohen-Addad, Silvio Lattanzi, Andreas Maggiori, Nikos Parotsidis

    Abstract: We study the classic problem of correlation clustering in dynamic node streams. In this setting, nodes are either added or randomly deleted over time, and each node pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML'24 (spotlight)

  2. arXiv:2406.04868  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Perturb-and-Project: Differentially Private Similarities and Marginals

    Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Alessandro Epasto, Vahab Mirrokni, Peilin Zhong

    Abstract: We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first design novel efficient algorithms to privately release pair-wise cosine similarities. Second, we derive a novel algorithm to compute $k$-way marginal queri… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 21 ppages, ICML 2024

    ACM Class: F.2; G.3

  3. arXiv:2406.04860  [pdf, other

    cs.LG cs.DS stat.ML

    Multi-View Stochastic Block Models

    Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Silvio Lattanzi, Rajai Nasser

    Abstract: Graph clustering is a central topic in unsupervised learning with a multitude of practical applications. In recent years, multi-view graph clustering has gained a lot of attention for its applicability to real-world instances where one has access to multiple data sources. In this paper we formalize a new family of models, called \textit{multi-view stochastic block models} that captures this settin… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 31 pages, ICML 2024

    ACM Class: F.2; G.3

  4. arXiv:2406.04857  [pdf, ps, other

    cs.DS cs.LG

    A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

    Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Aida Mousavifar

    Abstract: We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC'12], where, given a random bipartite graph with $α$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are \textit{monotone} with respect to the bipartition). F… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 24 pages, ICML 2024

    ACM Class: F.2; G.3

  5. arXiv:2405.01339  [pdf, other

    cs.DS

    Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

    Authors: Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic, Chris Schwiegelshohn

    Abstract: Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}\|p-s\|^2$. A ver… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 57 pages

  6. Understanding the Cluster LP for Correlation Clustering

    Authors: Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman, Lukas Vogl

    Abstract: In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla~(FOCS 2002), the input is a complete graph where edges are labeled either $+$ or $-$, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev~(STOC 2015) gave a 2.06-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  7. arXiv:2404.06797  [pdf, other

    cs.DS

    Fully Dynamic Correlation Clustering: Breaking 3-Approximation

    Authors: Soheil Behnezhad, Moses Charikar, Vincent Cohen-Addad, Alma Ghafari, Weiyun Ma

    Abstract: We study the classic correlation clustering in the dynamic setting. Given $n$ objects and a complete labeling of the object-pairs as either similar or dissimilar, the goal is to partition the objects into arbitrarily many clusters while minimizing disagreements with the labels. In the dynamic setting, an update consists of a flip of a label of an edge. In a breakthrough result, [BDHSS, FOCS'19] sh… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  8. arXiv:2404.05433  [pdf, ps, other

    cs.DS

    Combinatorial Correlation Clustering

    Authors: Vincent Cohen-Addad, David Rasmussen Lolck, Marcin Pilipczuk, Mikkel Thorup, Shuyi Yan, Hanwen Zhang

    Abstract: Correlation Clustering is a classic clustering objective arising in numerous machine learning and data mining applications. Given a graph $G=(V,E)$, the goal is to partition the vertex set into clusters so as to minimize the number of edges between clusters plus the number of edges missing within clusters. The problem is APX-hard and the best known polynomial time approximation factor is 1.73 by… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Acccepted at STOC 2024

  9. arXiv:2402.18263  [pdf, ps, other

    cs.DS cs.CC

    Max-Cut with $ε$-Accurate Predictions

    Authors: Vincent Cohen-Addad, Tommaso d'Orsi, Anupam Gupta, Euiwoong Lee, Debmalya Panigrahi

    Abstract: We study the approximability of the MaxCut problem in the presence of predictions. Specifically, we consider two models: in the noisy predictions model, for each vertex we are given its correct label in $\{-1,+1\}$ with some unknown probability $1/2 + ε$, and the other (incorrect) label otherwise. In the more-informative partial predictions model, for each vertex we are given its correct label wit… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 18 pages

    ACM Class: F.0

  10. arXiv:2402.17327  [pdf, other

    cs.LG cs.DS

    Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

    Authors: Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

    Abstract: We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and sensitivity sampling. Assuming access to an embedding representation of the data with respect to which the model loss is Hölder continuous, our approach provably a… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  11. arXiv:2402.06730  [pdf, other

    cs.DS cs.CY cs.LG

    A Scalable Algorithm for Individually Fair K-means Clustering

    Authors: MohammadHossein Bateni, Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi

    Abstract: We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $δ(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $δ(x)$ of $x$ for each $x\in P$. While g… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 32 pages, 2 figures, to appear at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  12. arXiv:2311.17840  [pdf, other

    cs.DS cs.LG stat.ML

    A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies

    Authors: Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, Rajesh Jayaram, Silvio Lattanzi

    Abstract: Multi-dimensional Scaling (MDS) is a family of methods for embedding an $n$-point metric into low-dimensional Euclidean space. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities $\{d_{i,j}\}_{i , j \in [n]}$ over $n$ points, the goal is to find an embedding $\{x_1,\dots,x_n\} \in \mathbb{R}^k$ that minimizes \[\text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \l… ▽ More

    Submitted 11 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: Extended exposition

  13. arXiv:2311.00892  [pdf, other

    cs.DS

    A PTAS for $\ell_0$-Low Rank Approximation: Solving Dense CSPs over Reals

    Authors: Vincent Cohen-Addad, Chenglin Fan, Suprovat Ghoshal, Euiwoong Lee, Arnaud de Mesmay, Alantha Newman, Tony Chang Wang

    Abstract: We consider the Low Rank Approximation problem, where the input consists of a matrix $A \in \mathbb{R}^{n_R \times n_C}$ and an integer $k$, and the goal is to find a matrix $B$ of rank at most $k$ that minimizes $\| A - B \|_0$, which is the number of entries where $A$ and $B$ differ. For any constant $k$ and $\varepsilon > 0$, we present a polynomial time $(1 + \varepsilon)$-approximation time f… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: To appear in SODA 24

  14. arXiv:2310.04076  [pdf, other

    cs.DS

    Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

    Authors: Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

    Abstract: In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension $d$, the preci… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: FOCS 2023. Abstract reduced for arxiv requirements

  15. arXiv:2310.02882  [pdf, other

    cs.DS

    Streaming Euclidean $k$-median and $k$-means with $o(\log n)$ Space

    Authors: Vincent Cohen-Addad, David P. Woodruff, Samson Zhou

    Abstract: We consider the classic Euclidean $k$-median and $k$-means objective on data streams, where the goal is to provide a $(1+\varepsilon)$-approximation to the optimal $k$-median or $k$-means solution, while using as little memory as possible. Over the last 20 years, clustering in data streams has received a tremendous amount of attention and has been the test-bed for a large variety of new techniques… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: To appear at FOCS 2023

  16. arXiv:2309.17243  [pdf, other

    cs.DS

    Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering

    Authors: Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman

    Abstract: We consider the classic Correlation Clustering problem: Given a complete graph where edges are labelled either $+$ or $-$, the goal is to find a partition of the vertices that minimizes the number of the \pedges across parts plus the number of the \medges within parts. Recently, Cohen-Addad, Lee and Newman [CLN22] presented a 1.994-approximation algorithm for the problem using the Sherali-Adams hi… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  17. arXiv:2309.16384  [pdf, other

    cs.CG cs.LG

    Multi-Swap $k$-Means++

    Authors: Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis

    Abstract: The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023

  18. arXiv:2304.07268  [pdf, ps, other

    cs.DS cs.DM

    Planar and Minor-Free Metrics Embed into Metrics of Polylogarithmic Treewidth with Expected Multiplicative Distortion Arbitrarily Close to 1

    Authors: Vincent Cohen-Addad, Hung Le, Marcin Pilipczuk, Michał Pilipczuk

    Abstract: We prove that there is a randomized polynomial-time algorithm that given an edge-weighted graph $G$ excluding a fixed-minor $Q$ on $n$ vertices and an accuracy parameter $\varepsilon>0$, constructs an edge-weighted graph~$H$ and an embedding $η\colon V(G)\to V(H)$ with the following properties: * For any constant size $Q$, the treewidth of $H$ is polynomial in $\varepsilon^{-1}$, $\log n$, and the… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  19. arXiv:2302.00037  [pdf, other

    cs.LG cs.CR cs.DS

    Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees

    Authors: Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni

    Abstract: Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $ε$-DP algorithm must exhibit $O(|V|^2/ ε)$-additive error… ▽ More

    Submitted 23 May, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: 28 pages, 1 figure

  20. arXiv:2301.04822  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Private estimation algorithms for stochastic block models and mixture models

    Authors: Hongjie Chen, Vincent Cohen-Addad, Tommaso d'Orsi, Alessandro Epasto, Jacob Imola, David Steurer, Stefan Tiegel

    Abstract: We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient $(ε, δ)$-… ▽ More

    Submitted 15 November, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  21. arXiv:2212.14220  [pdf, ps, other

    cs.DS

    Graph Searching with Predictions

    Authors: Siddhartha Banerjee, Vincent Cohen-Addad, Anupam Gupta, Zhouzi Li

    Abstract: Consider an agent exploring an unknown graph in search of some goal state. As it walks around the graph, it learns the nodes and their neighbors. The agent only knows where the goal state is when it reaches it. How do we reach this goal while moving only a small distance? This problem seems hopeless, even on trees of bounded degree, unless we give the agent some help. This setting with ''help'' of… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

  22. arXiv:2212.06546  [pdf, other

    cs.DS

    Streaming Euclidean MST to a Constant Factor

    Authors: Vincent Cohen-Addad, Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten

    Abstract: We study streaming algorithms for the fundamental geometric problem of computing the cost of the Euclidean Minimum Spanning Tree (MST) on an $n$-point set $X \subset \mathbb{R}^d$. In the streaming model, the points in $X$ can be added and removed arbitrarily, and the goal is to maintain an approximation in small space. In low dimensions, $(1+ε)$ approximations are possible in sublinear space [Fra… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  23. arXiv:2211.08184  [pdf, other

    cs.CG cs.LG

    Improved Coresets for Euclidean $k$-Means

    Authors: Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

    Abstract: Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weigh… ▽ More

    Submitted 16 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

  24. arXiv:2209.01901  [pdf, ps, other

    cs.DS

    The Power of Uniform Sampling for Coresets

    Authors: Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

    Abstract: Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive er… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

  25. arXiv:2208.14129  [pdf, other

    cs.DS

    On the Fixed-Parameter Tractability of Capacitated Clustering

    Authors: Vincent Cohen-Addad, Jason Li

    Abstract: We study the complexity of the classic capacitated k-median and k-means problems parameterized by the number of centers, k. These problems are notoriously difficult since the best known approximation bound for high dimensional Euclidean space and general metric space is $Θ(\log k)$ and it remains a major open problem whether a constant factor exists. We show that there exists a $(3+ε)$-approximati… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Full version of the ICALP'19 paper (w/ same title, same authors)

  26. arXiv:2208.13920  [pdf, other

    cs.DS

    Fitting Metrics and Ultrametrics with Minimum Disagreements

    Authors: Vincent Cohen-Addad, Chenglin Fan, Euiwoong Lee, Arnaud de Mesmay

    Abstract: Given $x \in (\mathbb{R}_{\geq 0})^{\binom{[n]}{2}}$ recording pairwise distances, the METRIC VIOLATION DISTANCE (MVD) problem asks to compute the $\ell_0$ distance between $x$ and the metric cone; i.e., modify the minimum number of entries of $x$ to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied rec… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: To appear at FOCS 2022 (Full version)

  27. arXiv:2207.10889  [pdf, ps, other

    cs.DS

    Correlation Clustering with Sherali-Adams

    Authors: Vincent Cohen-Addad, Euiwoong Lee, Alantha Newman

    Abstract: Given a complete graph $G = (V, E)$ where each edge is labeled $+$ or $-$, the Correlation Clustering problem asks to partition $V$ into clusters to minimize the number of $+$edges between different clusters plus the number of $-$edges within the same cluster. Correlation Clustering has been used to model a large number of clustering problems in practice, making it one of the most widely studied c… ▽ More

    Submitted 3 May, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

  28. arXiv:2207.05150  [pdf, ps, other

    cs.DS

    Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median

    Authors: Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn

    Abstract: The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients $C$ and a set of facilities $F$ in a metric space $(C \cup F, dist)$ with facility costs $open : F \to \mathbb{R}^+$, the goal is to find a set of facilities $S \subseteq F$ to minimize the sum of the opening cost $open(S)$ and the connection cost… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 55 pages

  29. arXiv:2206.08646  [pdf, other

    cs.DS cs.CR cs.LG

    Scalable Differentially Private Clustering via Hierarchically Separated Trees

    Authors: Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei Vassilvitskii

    Abstract: We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / ε^2)$, where $ε$ is the priv… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: To appear at KDD'22

  30. arXiv:2205.12327  [pdf, other

    cs.LG cs.CY

    Beyond Impossibility: Balancing Sufficiency, Separation and Accuracy

    Authors: Limor Gultchin, Vincent Cohen-Addad, Sophie Giffard-Roisin, Varun Kanade, Frederik Mallmann-Trenn

    Abstract: Among the various aspects of algorithmic fairness studied in recent years, the tension between satisfying both \textit{sufficiency} and \textit{separation} -- e.g. the ratios of positive or negative predictive values, and false positive or false negative rates across groups -- has received much attention. Following a debate sparked by COMPAS, a criminal justice predictive system, the academic comm… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  31. arXiv:2204.04828  [pdf, ps, other

    cs.DS cs.CG cs.LG

    Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets

    Authors: Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, Shyam Narayanan

    Abstract: Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward. Our algorithm achieves an approximation ratio of $2.406$ and $5.912$ for Euclidean… ▽ More

    Submitted 11 April, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: 74 pages. To appear in Symposium on Theory of Computing (STOC), 2022

  32. arXiv:2203.01857  [pdf, ps, other

    cs.DS

    Improved Approximation Algorithms and Lower Bounds for Search-Diversification Problems

    Authors: Amir Abboud, Vincent Cohen-Addad, Euiwoong Lee, Pasin Manurangsi

    Abstract: We study several questions related to diversifying search results. We give improved approximation algorithms in each of the following problems, together with some lower bounds. - We give a polynomial-time approximation scheme (PTAS) for a diversified search ranking problem [Bansal et al., ICALP 2010] whose objective is to minimizes the discounted cumulative gain. Our PTAS runs in time… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  33. arXiv:2203.01440  [pdf, ps, other

    cs.LG cs.CR cs.DS

    Near-Optimal Correlation Clustering with Privacy

    Authors: Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

    Abstract: Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes'… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  34. arXiv:2202.12793  [pdf, other

    cs.DS cs.CG cs.LG

    Towards Optimal Lower Bounds for k-median and k-means Coresets

    Authors: Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

    Abstract: Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized. Special cases include the famous k-median problem ($z = 1$) and k-means problem ($z = 2$). The $k$-median and $k$-means problems are at the heart of modern da… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  35. arXiv:2112.03222  [pdf, ps, other

    cs.CC cs.CG cs.DS cs.LG

    On Complexity of 1-Center in Various Metrics

    Authors: Amir Abboud, Mohammad Hossein Bateni, Vincent Cohen-Addad, Karthik C. S., Saeed Seddighin

    Abstract: We consider the classic 1-center problem: Given a set $P$ of $n$ points in a metric space find the point in $P$ that minimizes the maximum distance to the other points of $P$. We study the complexity of this problem in $d$-dimensional $\ell_p$-metrics and in edit and Ulam metrics over strings of length $d$. Our results for the 1-center problem may be classified based on $d$ as follows.… ▽ More

    Submitted 9 July, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

  36. arXiv:2111.10912  [pdf, ps, other

    cs.CC cs.CG cs.DS cs.LG

    Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in L_p metrics

    Authors: Vincent Cohen-Addad, Karthik C. S., Euiwoong Lee

    Abstract: K-median and k-means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in $\ell_p$-metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in $\ell_p$-metrics. We introduce a… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

    Comments: Abstract in metadata shortened to meet arxiv requirements

  37. arXiv:2111.06163  [pdf, other

    cs.DS cs.DM

    A 2-Approximation for the Bounded Treewidth Sparsest Cut Problem in FPT Time

    Authors: Vincent Cohen-Addad, Tobias Mömke, Victor Verdugo

    Abstract: In the non-uniform sparsest cut problem, we are given a supply graph G and a demand graph D, both with the same set of nodes V. The goal is to find a cut of V that minimizes the ratio of the total capacity on the edges of G crossing the cut over the total demand of the crossing edges of D. In this work, we study the non-uniform sparsest cut problem for supply graphs with bounded treewidth k. For t… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: 14 pages, 2 figures

  38. arXiv:2111.04589  [pdf, other

    cs.DS

    An Improved Local Search Algorithm for k-Median

    Authors: Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, David Saulpic

    Abstract: We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+ε)$-approximation; our result improves upon the $(3+ε)$-approximate local-search algorithm of Arya et al. [STOC 01]. Moreover, a computer-aided analysis of a natural extension suggests that this approach may lead to an improvement over the best-known approximat… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: To appear at SODA 22

    ACM Class: F.2.2

  39. Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

    Authors: Vincent Cohen-Addad, Debarati Das, Evangelos Kipouridis, Nikos Parotsidis, Mikkel Thorup

    Abstract: We consider the numerical taxonomy problem of fitting a positive distance function ${D:{S\choose 2}\rightarrow \mathbb R_{>0}}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match those in $D$. A nice application is in evolutionary biology where the tree $T$ aims to approximate the branching process leading to th… ▽ More

    Submitted 11 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: 46 pages, Accepted to FOCS 2021 (Full version)

  40. arXiv:2106.08448  [pdf, other

    cs.DS cs.DC cs.LG

    Correlation Clustering in Constant Many Parallel Rounds

    Authors: Vincent Cohen-Addad, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

    Abstract: Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular,… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ICML 2021 (long talk)

  41. arXiv:2106.08195  [pdf, ps, other

    cs.DS

    A Linear-Time $n^{0.4}$-Approximation for Longest Common Subsequence

    Authors: Karl Bringmann, Vincent Cohen-Addad, Debarati Das

    Abstract: We consider the classic problem of computing the Longest Common Subsequence (LCS) of two strings of length $n$. While a simple quadratic algorithm has been known for the problem for more than 40 years, no faster algorithm has been found despite an extensive effort. The lack of progress on the problem has recently been explained by Abboud, Backurs, and Vassilevska Williams [FOCS'15] and Bringmann a… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: full version of ICALP'21 paper, abstract shortened to fit Arxiv requirements

    MSC Class: 68W32; 68W25 ACM Class: F.2.2

  42. arXiv:2105.15187  [pdf, other

    cs.DS

    A Quasipolynomial $(2+\varepsilon)$-Approximation for Planar Sparsest Cut

    Authors: Vincent Cohen-Addad, Anupam Gupta, Philip N. Klein, Jason Li

    Abstract: The (non-uniform) sparsest cut problem is the following graph-partitioning problem: given a "supply" graph, and demands on pairs of vertices, delete some subset of supply edges to minimize the ratio of the supply edges cut to the total demand of the pairs separated by this deletion. Despite much effort, there are only a handful of nontrivial classes of supply graphs for which constant-factor appro… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: To appear at STOC 2021

  43. A New Coreset Framework for Clustering

    Authors: Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

    Abstract: Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median ($z=1$) and $k$-means ($z=2$) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known a… ▽ More

    Submitted 29 July, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Improved presentation. Adds a simpler suboptimal proof for interesting points, and an improved analysis for planar graphs. Corrects errors in the construction of centroid sets

  44. arXiv:2012.11891  [pdf, ps, other

    cs.LG cs.DS

    Fast and Accurate $k$-means++ via Rejection Sampling

    Authors: Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

    Abstract: $k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  45. arXiv:2010.00087  [pdf, ps, other

    cs.CC cs.DS cs.LG

    On Approximability of Clustering Problems Without Candidate Centers

    Authors: Vincent Cohen-Addad, Karthik C. S., Euiwoong Lee

    Abstract: The k-means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k-means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding… ▽ More

    Submitted 2 October, 2020; v1 submitted 30 September, 2020; originally announced October 2020.

  46. arXiv:2009.05039  [pdf, other

    cs.DS

    On Light Spanners, Low-treewidth Embeddings and Efficient Traversing in Minor-free Graphs

    Authors: Vincent Cohen-Addad, Arnold Filtser, Philip N. Klein, Hung Le

    Abstract: Understanding the structure of minor-free metrics, namely shortest path metrics obtained over a weighted graph excluding a fixed minor, has been an important research direction since the fundamental work of Robertson and Seymour. A fundamental idea that helps both to understand the structural properties of these metrics and lead to strong algorithmic results is to construct a "small-complexity" gr… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: 65 pages, 6 figures. Abstract shorten due to limited characters

    ACM Class: F.2.2

  47. arXiv:2009.00188  [pdf, other

    cs.DS

    On the computational tractability of a geographic clustering problem arising in redistricting

    Authors: Vincent Cohen-Addad, Philip N. Klein, Dániel Marx

    Abstract: Redistricting is the problem of dividing a state into a number $k$ of regions, called districts. Voters in each district elect a representative. The primary criteria are: each district is connected, district populations are equal (or nearly equal), and districts are "compact". There are multiple competing definitions of compactness, usually minimizing some quantity. One measure that has been rec… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

  48. arXiv:2008.06700  [pdf, other

    cs.DS cs.CC cs.CG cs.LG math.MG

    On Efficient Low Distortion Ultrametric Embedding

    Authors: Vincent Cohen-Addad, Karthik C. S., Guillaume Lagarde

    Abstract: A classic problem in unsupervised learning and data analysis is to find simpler and easy-to-visualize representations of the data that preserve its essential properties. A widely-used method to preserve the underlying hierarchical structure of the data while reducing its complexity is to find an embedding of the data into a tree or an ultrametric. The most popular algorithms for this task are the… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  49. arXiv:2007.02377  [pdf, other

    cs.DS

    New Hardness Results for Planar Graph Problems in P and an Algorithm for Sparsest Cut

    Authors: Amir Abboud, Vincent Cohen-Addad, Philip N. Klein

    Abstract: The Sparsest Cut is a fundamental optimization problem that has been extensively studied. For planar inputs the problem is in $P$ and can be solved in $\tilde{O}(n^3)$ time if all vertex weights are $1$. Despite a significant amount of effort, the best algorithms date back to the early 90's and can only achieve $O(\log n)$-approximation in $\tilde{O}(n)$ time or a constant factor approximation in… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

  50. arXiv:1909.06861  [pdf, other

    cs.LG stat.ML

    Online k-means Clustering

    Authors: Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

    Abstract: We study the problem of online clustering where a clustering algorithm has to assign a new point that arrives to one of $k$ clusters. The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center. The goal is to minimize regret with r… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: 11 pages, 1 figure

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 130:1126-1134, 2021