Skip to main content

Showing 1–29 of 29 results for author: Schwiegelshohn, C

.
  1. arXiv:2406.05254  [pdf, other

    cs.DS

    A Simple and Optimal Sublinear Algorithm for Mean Estimation

    Authors: Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn

    Abstract: We study the sublinear mean estimation problem. Specifically, we aim to output a point minimizing the sum of squared Euclidean distances. We show that a multiplicative $(1+\varepsilon)$ approximation can be found with probability $1-δ$ using $O(\varepsilon^{-1}\log δ^{-1})$ many independent random samples. We also provide a matching lower bound.

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2405.01339  [pdf, other

    cs.DS

    Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds

    Authors: Nikhil Bansal, Vincent Cohen-Addad, Milind Prabhu, David Saulpic, Chris Schwiegelshohn

    Abstract: Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $Ω$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}\|p-s\|^2$. A ver… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 57 pages

  3. arXiv:2404.01936  [pdf, other

    cs.LG cs.DS

    Settling Time vs. Accuracy Tradeoffs for Clustering Big Data

    Authors: Andrew Draganov, David Saulpic, Chris Schwiegelshohn

    Abstract: We study the theoretical and practical runtime limits of k-means and k-median clustering on large datasets. Since effectively all clustering methods are slower than the time it takes to read the dataset, the fastest approach is to quickly compress the data and perform the clustering on the compressed representation. Unfortunately, there is no universal best choice for compressing the number of poi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  4. arXiv:2402.04035  [pdf, ps, other

    cs.GT

    Low-Distortion Clustering with Ordinal and Limited Cardinal Information

    Authors: Jakob Burkhardt, Ioannis Caragiannis, Karl Fehrs, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam

    Abstract: Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: to appear in AAAI 2024

  5. arXiv:2310.18146  [pdf, other

    cs.DS

    Adaptive Out-Orientations with Applications

    Authors: Chandra Chekuri, Aleksander Bjørn Christiansen, Jacob Holm, Ivor van der Hoog, Kent Quanrud, Eva Rotenberg, Chris Schwiegelshohn

    Abstract: We give improved algorithms for maintaining edge-orientations of a fully-dynamic graph, such that the out-degree of each vertex is bounded. On one hand, we show how to orient the edges such that the out-degree of each vertex is proportional to the arboricity $α$ of the graph, in, either, an amortised update time of $O(\log^2 n \log α)$, or a worst-case update time of $O(\log^3 n \log α)$. On the o… ▽ More

    Submitted 4 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: To appear at SODA24

  6. arXiv:2310.09127  [pdf, other

    cs.LG

    On Generalization Bounds for Projective Clustering

    Authors: Maria Sofia Bucarelli, Matilde Fjeldsø Larsen, Chris Schwiegelshohn, Mads Bech Toftrup

    Abstract: Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous $k$-median and $k$-means objectives. One may also choose centers to be $j$ dimensional subspaces, which gives rise to subspace clustering. In this paper,… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  7. arXiv:2310.04076  [pdf, other

    cs.DS

    Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation

    Authors: Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

    Abstract: In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic $k$-median and $k$-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension $d$, the preci… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: FOCS 2023. Abstract reduced for arxiv requirements

  8. arXiv:2304.02261  [pdf, other

    cs.DS cs.LG stat.ML

    Optimal Sketching Bounds for Sparse Linear Regression

    Authors: Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff

    Abstract: We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $Θ(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This ex… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: AISTATS 2023

  9. arXiv:2302.06165  [pdf, ps, other

    cs.DS cs.LG

    Sparse Dimensionality Reduction Revisited

    Authors: Mikael Møller Høgsgaard, Lion Kamma, Kasper Green Larsen, Jelani Nelson, Chris Schwiegelshohn

    Abstract: The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise distances to within $1 \pm \varepsilon$. Each input point $x$ is embedded to $Ax$, where $A$ is an $m \times d$ matrix having $s$ non-zeros per column, allowin… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  10. arXiv:2302.03071  [pdf, ps, other

    cs.GT cs.DS

    Optimally Interpolating between Ex-Ante Fairness and Welfare

    Authors: Mikael Høgsgaard, Panagiotis Karras, Wenyue Ma, Nidhi Rathi, Chris Schwiegelshohn

    Abstract: For the fundamental problem of allocating a set of resources among individuals with varied preferences, the quality of an allocation relates to the degree of fairness and the collective welfare achieved. Unfortunately, in many resource-allocation settings, it is computationally hard to maximize welfare while achieving fairness goals. In this work, we consider ex-ante notions of fairness; popular… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  11. arXiv:2211.08184  [pdf, other

    cs.CG cs.LG

    Improved Coresets for Euclidean $k$-Means

    Authors: Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, Omar Ali Sheikh-Omar

    Abstract: Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weigh… ▽ More

    Submitted 16 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

  12. arXiv:2209.14087  [pdf, ps, other

    cs.DS

    Adaptive Out-Orientations with Applications

    Authors: Aleksander B. G. Christiansen, Jacob Holm, Ivor van der Hoog, Eva Rotenberg, Chris Schwiegelshohn

    Abstract: We give improved algorithms for maintaining edge-orientations of a fully-dynamic graph, such that the out-degree of each vertex is bounded. On one hand, we show how to orient the edges such that the out-degree of each vertex is proportional to the arboricity $α$ of the graph, in a worst-case update time of $O(\log^3 n \log α)$. On the other hand, motivated by applications including dynamic maximal… ▽ More

    Submitted 15 February, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

  13. arXiv:2209.01901  [pdf, ps, other

    cs.DS

    The Power of Uniform Sampling for Coresets

    Authors: Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, Xuan Wu

    Abstract: Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive er… ▽ More

    Submitted 17 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

  14. arXiv:2207.05150  [pdf, ps, other

    cs.DS

    Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median

    Authors: Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn

    Abstract: The Uncapacitated Facility Location (UFL) problem is one of the most fundamental clustering problems: Given a set of clients $C$ and a set of facilities $F$ in a metric space $(C \cup F, dist)$ with facility costs $open : F \to \mathbb{R}^+$, the goal is to find a set of facilities $S \subseteq F$ to minimize the sum of the opening cost $open(S)$ and the connection cost… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 55 pages

  15. arXiv:2207.00966  [pdf, other

    cs.DS cs.LG

    An Empirical Evaluation of $k$-Means Coresets

    Authors: Chris Schwiegelshohn, Omar Ali Sheikh-Omar

    Abstract: Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on comparing the quality of available $k$-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

  16. arXiv:2206.08646  [pdf, other

    cs.DS cs.CR cs.LG

    Scalable Differentially Private Clustering via Hierarchically Separated Trees

    Authors: Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei Vassilvitskii

    Abstract: We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space. By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods. We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / ε^2)$, where $ε$ is the priv… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: To appear at KDD'22

  17. arXiv:2202.12793  [pdf, other

    cs.DS cs.CG cs.LG

    Towards Optimal Lower Bounds for k-median and k-means Coresets

    Authors: Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn

    Abstract: Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized. Special cases include the famous k-median problem ($z = 1$) and k-means problem ($z = 2$). The $k$-median and $k$-means problems are at the heart of modern da… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  18. arXiv:2108.08825  [pdf, other

    cs.DS

    Maintaining an EDCS in General Graphs: Simpler, Density-Sensitive and with Worst-Case Time Bounds

    Authors: Fabrizio Grandoni, Chris Schwiegelshohn, Shay Solomon, Amitai Uzrad

    Abstract: In their breakthrough ICALP'15 paper, Bernstein and Stein presented an algorithm for maintaining a $(3/2+ε)$-approximate maximum matching in fully dynamic {\em bipartite} graphs with a {\em worst-case} update time of $O_ε(m^{1/4})$; we use the $O_ε$ notation to suppress the $ε$-dependence. Their main technical contribution was in presenting a new type of bounded-degree subgraph, which they named a… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  19. A New Coreset Framework for Clustering

    Authors: Vincent Cohen-Addad, David Saulpic, Chris Schwiegelshohn

    Abstract: Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median ($z=1$) and $k$-means ($z=2$) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known a… ▽ More

    Submitted 29 July, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Improved presentation. Adds a simpler suboptimal proof for interesting points, and an improved analysis for planar graphs. Corrects errors in the construction of centroid sets

  20. arXiv:2002.11621  [pdf, ps, other

    cs.CY cs.LG cs.SI stat.ML

    Algorithms for Fair Team Formation in Online Labour Marketplaces

    Authors: Giorgio Barnabò, Adriano Fazzone, Stefano Leonardi, Chris Schwiegelshohn

    Abstract: As freelancing work keeps on growing almost everywhere due to a sharp decrease in communication costs and to the widespread of Internet-based labour marketplaces (e.g., guru.com, feelancer.com, mturk.com, upwork.com), many researchers and practitioners have started exploring the benefits of outsourcing and crowdsourcing. Since employers often use these platforms to find a group of workers to compl… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted at "FATES 2019 : 1st Workshop on Fairness, Accountability, Transparency, Ethics, and Society on the Web" (http://fates19.isti.cnr.it)

    Journal ref: "Companion Proceedings of The 2019 World Wide Web Conference", 2019, pages 484-490

  21. arXiv:2002.07892  [pdf, other

    cs.DS

    Fair Clustering with Multiple Colors

    Authors: Matteo Böhm, Adriano Fazzone, Stefano Leonardi, Chris Schwiegelshohn

    Abstract: A fair clustering instance is given a data set $A$ in which every point is assigned some color. Colors correspond to various protected attributes such as sex, ethnicity, or age. A fair clustering is an instance where membership of points in a cluster is uncorrelated with the coloring of the points. Of particular interest is the case where all colors are equally represented. If we have exactly tw… ▽ More

    Submitted 5 March, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: Partially supported by the ERC Advanced Grant 788893 AMDROMA "Algorithmic and Mechanism Design Research in Online Markets" and MIUR PRIN project ALGADIMAR "Algorithms, Games, and Digital Markets"

  22. arXiv:1905.13651  [pdf, other

    cs.DS cs.LG

    Principal Fairness: Removing Bias via Projections

    Authors: Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris Schwiegelshohn

    Abstract: Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. We complement several recent papers in this line of research by introducing a general method to reduce bias in the data through random projections in a "fair" subspace. We apply this method to densest subgraph problem. For densest subgraph, our approach based on fair p… ▽ More

    Submitted 5 March, 2021; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Partially supported by the ERC Advanced Grant 788893 AMDROMA "Algorithmic and Mechanism Design Research in Online Markets" and MIUR PRIN project ALGADIMAR "Algorithms, Games, and Digital Markets"

  23. arXiv:1904.06150  [pdf, ps, other

    cs.DS

    Maximizing Online Utilization with Commitment

    Authors: Chris Schwiegelshohn, Uwe Schwiegelshohn

    Abstract: We investigate online scheduling with commitment for parallel identical machines. Our objective is to maximize the total processing time of accepted jobs. As soon as a job has been submitted, the commitment constraint forces us to decide immediately whether we accept or reject the job. Upon acceptance of a job, we must complete it before its deadline $d$ that satisfies $d \geq (1+ε)\cdot p + r$, w… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 13 pages

    MSC Class: 68W27; 68W40

  24. arXiv:1812.10854  [pdf, ps, other

    cs.DS

    Fair Coresets and Streaming Algorithms for Fair k-Means Clustering

    Authors: Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler

    Abstract: We study fair clustering problems as proposed by Chierichetti et al. (NIPS 2017). Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which c… ▽ More

    Submitted 9 March, 2021; v1 submitted 27 December, 2018; originally announced December 2018.

  25. arXiv:1805.08571  [pdf, other

    cs.DS cs.LG stat.ML

    On Coresets for Logistic Regression

    Authors: Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff

    Abstract: Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $μ(X)$, which quantifies… ▽ More

    Submitted 8 March, 2021; v1 submitted 22 May, 2018; originally announced May 2018.

  26. arXiv:1701.08423  [pdf, other

    cs.DS cs.CG cs.LG

    On the Local Structure of Stable Clustering Instances

    Authors: Vincent Cohen-Addad, Chris Schwiegelshohn

    Abstract: We study the classic $k$-median and $k$-means clustering objectives in the beyond-worst-case scenario. We consider three well-studied notions of structured data that aim at characterizing real-world inputs: Distribution Stability (introduced by Awasthi, Blum, and Sheffet, FOCS 2010), Spectral Separability (introduced by Kumar and Kannan, FOCS 2010), Perturbation Resilience (introduced by Bilu and… ▽ More

    Submitted 10 August, 2017; v1 submitted 29 January, 2017; originally announced January 2017.

  27. arXiv:1605.03949  [pdf, other

    cs.DS

    Efficient Similarity Search in Dynamic Data Streams

    Authors: Marc Bury, Chris Schwiegelshohn, Mara Sorella

    Abstract: The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster approximate methods. The algorithm of choice used to quickly compute the Jaccard index $\frac{\vert A \cap B \vert}{\vert A\cup B\vert}$ of two item sets $A$ and… ▽ More

    Submitted 8 March, 2021; v1 submitted 12 May, 2016; originally announced May 2016.

  28. arXiv:1505.02019  [pdf, other

    cs.DS

    Sublinear Estimation of Weighted Matchings in Dynamic Data Streams

    Authors: Marc Bury, Chris Schwiegelshohn

    Abstract: This paper presents an algorithm for estimating the weight of a maximum weighted matching by augmenting any estimation routine for the size of an unweighted matching. The algorithm is implementable in any streaming model including dynamic graph streams. We also give the first constant estimation for the maximum matching size in a dynamic graph stream for planar graphs (or any graph with bounded ar… ▽ More

    Submitted 9 July, 2015; v1 submitted 8 May, 2015; originally announced May 2015.

  29. arXiv:1504.01584   

    cs.DS

    Random Projections for k-Means: Maintaining Coresets Beyond Merge & Reduce

    Authors: Marc Bury, Chris Schwiegelshohn

    Abstract: We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$-means objective function. The number of points required in an offline construction is in $\tilde{O}(k ε^{-2}\min(d,kε^{-2}))$ which is minimal among all available constructions. Aside from two constructions with exponential dependence on the dimension, all known coresets ar… ▽ More

    Submitted 18 February, 2020; v1 submitted 7 April, 2015; originally announced April 2015.

    Comments: This paper has been withdrawn due to an error in Theorem 1