Skip to main content

Showing 1–26 of 26 results for author: Sohler, C

.
  1. arXiv:2402.09707  [pdf, other

    cs.DS

    On the adversarial robustness of Locality-Sensitive Hashing in Hamming space

    Authors: Michael Kapralov, Mikhail Makarov, Christian Sohler

    Abstract: Locality-sensitive hashing~[Indyk,Motwani'98] is a classical data structure for approximate nearest neighbor search. It allows, after a close to linear time preprocessing of the input dataset, to find an approximately nearest neighbor of any fixed query in sublinear time in the dataset size. The resulting data structure is randomized and succeeds with high probability for every fixed query. In m… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  2. arXiv:2212.14334  [pdf, ps, other

    cs.DS cs.LG cs.SI

    Constant Approximation for Normalized Modularity and Associations Clustering

    Authors: Jakub Łącki, Vahab Mirrokni, Christian Sohler

    Abstract: We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and norm… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    MSC Class: 68W25 ACM Class: F.2.2

  3. arXiv:2204.09951  [pdf, other

    cs.DS

    Motif Cut Sparsifiers

    Authors: Michael Kapralov, Mikhail Makarov, Sandeep Silwal, Christian Sohler, Jakab Tardos

    Abstract: A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial… ▽ More

    Submitted 12 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: 48 pages, 3 figures

  4. arXiv:2101.05549  [pdf, ps, other

    cs.DS

    Spectral Clustering Oracles in Sublinear Time

    Authors: Grzegorz Gluch, Michael Kapralov, Silvio Lattanzi, Aida Mousavifar, Christian Sohler

    Abstract: Given a graph $G$ that can be partitioned into $k$ disjoint expanders with outer conductance upper bounded by $ε\ll 1$, can we efficiently construct a small space data structure that allows quickly classifying vertices of $G$ according to the expander (cluster) they belong to? Formally, we would like an efficient local computation algorithm that misclassifies at most an $O(ε)$ fraction of vertices… ▽ More

    Submitted 19 October, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

    Comments: Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, 2021

  5. arXiv:2012.11891  [pdf, ps, other

    cs.LG cs.DS

    Fast and Accurate $k$-means++ via Rejection Sampling

    Authors: Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

    Abstract: $k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  6. arXiv:1909.10647  [pdf, other

    cs.DS

    A characterization of graph properties testable for general planar graphs with one-sided error (It is all about forbidden subgraphs)

    Authors: Artur Czumaj, Christian Sohler

    Abstract: The problem of characterizing testable graph properties (properties that can be tested with a number of queries independent of the input size) is a fundamental problem in the area of property testing. While there has been some extensive prior research characterizing testable graph properties in the dense graphs model and we have good understanding of the bounded degree graphs model, no similar cha… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  7. arXiv:1908.02645  [pdf, other

    cs.DS

    Fully dynamic hierarchical diameter k-clustering and k-center

    Authors: Melanie Schmidt, Christian Sohler

    Abstract: We develop dynamic data structures for maintaining a hierarchical k-center clustering when the points come from a discrete space $\{1,\ldots,Δ\}^d$. Our first data structure is for the low dimensional setting, i.e., d is a constant, and processes insertions, deletions and cluster representative queries in $\log^{O(1)} (Δn)$ time, where $n$ is the current size of the point set. For the high dimensi… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

  8. arXiv:1905.01644  [pdf, other

    cs.DS

    Testable Properties in General Graphs and Random Order Streaming

    Authors: Artur Czumaj, Hendrik Fichtenberger, Pan Peng, Christian Sohler

    Abstract: We present a novel framework closely linking the areas of property testing and data streaming algorithms in the setting of general graphs. It has been recently shown (Monemizadeh et al. 2017) that for bounded-degree graphs, any constant-query tester can be emulated in the random order streaming model by a streaming algorithm that uses only space required to store a constant number of words. Howeve… ▽ More

    Submitted 5 May, 2019; originally announced May 2019.

  9. arXiv:1812.10854  [pdf, ps, other

    cs.DS

    Fair Coresets and Streaming Algorithms for Fair k-Means Clustering

    Authors: Melanie Schmidt, Chris Schwiegelshohn, Christian Sohler

    Abstract: We study fair clustering problems as proposed by Chierichetti et al. (NIPS 2017). Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which c… ▽ More

    Submitted 9 March, 2021; v1 submitted 27 December, 2018; originally announced December 2018.

  10. arXiv:1811.02937  [pdf, ps, other

    cs.DS

    Every Testable (Infinite) Property of Bounded-Degree Graphs Contains an Infinite Hyperfinite Subproperty

    Authors: Hendrik Fichtenberger, Pan Peng, Christian Sohler

    Abstract: One of the most fundamental questions in graph property testing is to characterize the combinatorial structure of properties that are testable with a constant number of queries. We work towards an answer to this question for the bounded-degree graph model introduced in [Goldreich, Ron, 2002], where the input graphs have maximum degree bounded by a constant $d$. In this model, it is known (among ot… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

  11. arXiv:1809.02961  [pdf, ps, other

    cs.DS

    Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension

    Authors: Christian Sohler, David P. Woodruff

    Abstract: We obtain the first strong coresets for the $k$-median and subspace approximation problems with sum of distances objective function, on $n$ points in $d$ dimensions, with a number of weighted points that is independent of both $n$ and $d$; namely, our coresets have size $\text{poly}(k/ε)$. A strong coreset $(1+ε)$-approximates the cost function for all possible sets of centers simultaneously. We a… ▽ More

    Submitted 14 April, 2022; v1 submitted 9 September, 2018; originally announced September 2018.

  12. arXiv:1807.04518  [pdf, other

    cs.DS

    Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering

    Authors: Dan Feldman, Melanie Schmidt, Christian Sohler

    Abstract: We develop and analyze a method to reduce the size of a very large set of data points in a high dimensional Euclidean space R d to a small set of weighted points such that the result of a predetermined data analysis task on the reduced set is approximately the same as that for the original point set. For example, computing the first k principal components of the reduced set will return approximate… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: The conference version of this work appeared at SODA 2013

  13. arXiv:1805.08571  [pdf, other

    cs.DS cs.LG stat.ML

    On Coresets for Logistic Regression

    Authors: Alexander Munteanu, Chris Schwiegelshohn, Christian Sohler, David P. Woodruff

    Abstract: Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $μ(X)$, which quantifies… ▽ More

    Submitted 8 March, 2021; v1 submitted 22 May, 2018; originally announced May 2018.

  14. arXiv:1712.01725  [pdf, other

    cs.DS

    Approximating the Spectrum of a Graph

    Authors: David Cohen-Steiner, Weihao Kong, Christian Sohler, Gregory Valiant

    Abstract: The spectrum of a network or graph $G=(V,E)$ with adjacency matrix $A$, consists of the eigenvalues of the normalized Laplacian $L= I - D^{-1/2} A D^{-1/2}$. This set of eigenvalues encapsulates many aspects of the structure of the graph, including the extent to which the graph posses community structures at multiple scales. We study the problem of approximating the spectrum… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

  15. arXiv:1711.04881  [pdf, ps, other

    cs.DS

    Estimating Graph Parameters from Random Order Streams

    Authors: Pan Peng, Christian Sohler

    Abstract: We develop a new algorithmic technique that allows to transfer some constant time approximation algorithms for general graphs into random order streaming algorithms. We illustrate our technique by proving that in random order streams with probability at least $2/3$, $\bullet$ the number of connected components of $G$ can be approximated up to an additive error of $\varepsilon n$ using… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: SODA 2018

  16. arXiv:1707.07334  [pdf, ps, other

    cs.DS

    Testable Bounded Degree Graph Properties Are Random Order Streamable

    Authors: Morteza Monemizadeh, S. Muthukrishnan, Pan Peng, Christian Sohler

    Abstract: We study which property testing and sublinear time algorithms can be transformed into graph streaming algorithms for random order streams. Our main result is that for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams. Our result is obtained by estimating the distribution of loca… ▽ More

    Submitted 23 July, 2017; originally announced July 2017.

    Comments: A preliminary version was presented at the 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)

  17. arXiv:1706.03887  [pdf, other

    cs.DS

    Clustering High Dimensional Dynamic Data Streams

    Authors: Vladimir Braverman, Gereon Frahling, Harry Lang, Christian Sohler, Lin F. Yang

    Abstract: We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space $\{1, 2, \ldots Δ\}^d$. Our algorithms use $k ε^{-2} poly(d \log Δ)$ space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of $k$ c… ▽ More

    Submitted 12 June, 2017; originally announced June 2017.

    Comments: 33 pages, a preliminary version of this paper is presented on ICML 2017

  18. arXiv:1602.08254  [pdf, ps, other

    cs.DS cs.LG

    Theoretical Analysis of the $k$-Means Algorithm - A Survey

    Authors: Johannes Blömer, Christiane Lammersen, Melanie Schmidt, Christian Sohler

    Abstract: The $k$-means algorithm is one of the most widely used clustering heuristics. Despite its simplicity, analyzing its running time and quality of approximation is surprisingly difficult and can lead to deep insights that can be used to improve the algorithm. In this paper we survey the recent results in this direction as well as several extension of the basic $k$-means method.

    Submitted 26 February, 2016; originally announced February 2016.

  19. arXiv:1512.04349  [pdf, other

    cs.CG

    Clustering time series under the Fréchet distance

    Authors: Anne Driemel, Amer Krivošija, Christian Sohler

    Abstract: The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give $(1+\varepsilon)$-approximation algorithms for variations of the following problem with parameters $k$ and $\ell$. Given $n$ univariate time series $P$, each of complexity at most $m$, we find $k$ time series, not necessarily from $P$, whi… ▽ More

    Submitted 14 December, 2015; originally announced December 2015.

  20. Random projections for Bayesian regression

    Authors: Leo N. Geppert, Katja Ickstadt, Alexander Munteanu, Jens Quedenfeld, Christian Sohler

    Abstract: This article deals with random projections applied as a data reduction technique for Bayesian regression analysis. We show sufficient conditions under which the entire $d$-dimensional distribution is approximately preserved under random projections by reducing the number of data points from $n$ to $k\in O(\operatorname{poly}(d/\varepsilon))$ in the case $n\gg d$. Under mild assumptions, we prove t… ▽ More

    Submitted 30 November, 2015; v1 submitted 23 April, 2015; originally announced April 2015.

  21. arXiv:1504.03294  [pdf, ps, other

    cs.DS

    Testing Cluster Structure of Graphs

    Authors: Artur Czumaj, Pan Peng, Christian Sohler

    Abstract: We study the problem of recognizing the cluster structure of a graph in the framework of property testing in the bounded degree model. Given a parameter $\varepsilon$, a $d$-bounded degree graph is defined to be $(k, φ)$-clusterable, if it can be partitioned into no more than $k$ parts, such that the (inner) conductance of the induced subgraph on each part is at least $φ$ and the (outer) conductan… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

    Comments: Full version of STOC 2015

  22. arXiv:1408.1847  [pdf, ps, other

    cs.DS

    Asymptotically exact streaming algorithms

    Authors: Marc Heinrich, Alexander Munteanu, Christian Sohler

    Abstract: We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the algorithm is restricted to polylog(n) size. Thus, the output of the algorithm is optimal in the limit. We show positive results in our model for a series of import… ▽ More

    Submitted 8 August, 2014; originally announced August 2014.

  23. arXiv:1407.2109  [pdf, ps, other

    cs.DS

    Planar Graphs: Random Walks and Bipartiteness Testing

    Authors: Artur Czumaj, Morteza Monemizadeh, Krzysztof Onak, Christian Sohler

    Abstract: We initiate the study of property testing in arbitrary planar graphs. We prove that bipartiteness can be tested in constant time, improving on the previous bound of $\tilde{O}(\sqrt{n})$ for graphs on $n$ vertices. The constant-time testability was only known for planar graphs with bounded degree. Our algorithm is based on random walks. Since planar graphs have good separators, i.e., bad expansi… ▽ More

    Submitted 21 December, 2018; v1 submitted 8 July, 2014; originally announced July 2014.

  24. arXiv:1312.0497  [pdf, other

    cs.DS

    Property-Testing in Sparse Directed Graphs: 3-Star-Freeness and Connectivity

    Authors: Frank Hellweg, Christian Sohler

    Abstract: We study property testing in directed graphs in the bounded degree model, where we assume that an algorithm may only query the outgoing edges of a vertex, a model proposed by Bender and Ron in 2002. As our first main result, we we present a property testing algorithm for strong connectivity in this model, having a query complexity of $\mathcal{O}(n^{1-ε/(3+α)})$ for arbitrary $α>0$; it is based on… ▽ More

    Submitted 2 December, 2013; originally announced December 2013.

    Comments: Results partly published at ESA 2012

  25. arXiv:1012.3697  [pdf, ps, other

    cs.DS cs.CG cs.LG

    Analysis of Agglomerative Clustering

    Authors: Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, Christian Sohler

    Abstract: The diameter $k$-clustering problem is the problem of partitioning a finite subset of $\mathbb{R}^d$ into $k$ subsets called clusters such that the maximum diameter of the clusters is minimized. One early clustering algorithm that computes a hierarchy of approximate solutions to this problem (for all values of $k$) is the agglomerative clustering algorithm with the complete linkage strategy. For d… ▽ More

    Submitted 7 March, 2014; v1 submitted 16 December, 2010; originally announced December 2010.

    Comments: A preliminary version of this article appeared in Proceedings of the 28th International Symposium on Theoretical Aspects of Computer Science (STACS '11), March 2011, pp. 308-319. This article also appeared in Algorithmica. The final publication is available at http://link.springer.com/article/10.1007/s00453-012-9717-4

    ACM Class: F.2.2; H.3.3; I.5.3

    Journal ref: Ackermann, M. R., Blömer, J., Kuntze, D., and Sohler, C. (2014). Analysis of Agglomerative Clustering. Algorithmica, 69(1):184-215

  26. arXiv:1007.4230  [pdf, ps, other

    cs.DS cs.DM

    Finding Cycles and Trees in Sublinear Time

    Authors: Artur Czumaj, Oded Goldreich, Dana Ron, C. Seshadhri, Asaf Shapira, Christian Sohler

    Abstract: We present sublinear-time (randomized) algorithms for finding simple cycles of length at least $k\geq 3$ and tree-minors in bounded-degree graphs. The complexity of these algorithms is related to the distance of the graph from being $C_k$-minor-free (resp., free from having the corresponding tree-minor). In particular, if the graph is far (i.e., $Ω(1)$-far) {from} being cycle-free, i.e. if one has… ▽ More

    Submitted 3 April, 2012; v1 submitted 23 July, 2010; originally announced July 2010.

    Comments: Keywords: Sublinear-Time Algorithms, Property Testing, Bounded-Degree Graphs, One-Sided vs Two-Sided Error Probability Updated version