Skip to main content

Showing 1–23 of 23 results for author: Wirth, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.12701  [pdf, other

    cs.DS

    Exploiting New Properties of String Net Frequency for Efficient Computation

    Authors: Peaker Guo, Patrick Eades, Anthony Wirth, Justin Zobel

    Abstract: Knowing which strings in a massive text are significant -- that is, which strings are common and distinct from other strings -- is valuable for several applications, including text compression and tokenization. Frequency in itself is not helpful for significance, because the commonest strings are the shortest strings. A compelling alternative is net frequency, which has the property that strings w… ▽ More

    Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Full version of a paper to be published at the 35th Annual Symposium on Combinatorial Pattern Matching (CPM 2024)

  2. arXiv:2403.14087  [pdf, ps, other

    cs.DS

    Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams

    Authors: Amit Chakrabarti, Andrew McGregor, Anthony Wirth

    Abstract: The maximum coverage problem is to select $k$ sets from a collection of sets such that the cardinality of the union of the selected sets is maximized. We consider $(1-1/e-ε)$-approximation algorithms for this NP-hard problem in three standard data stream models. 1. {\em Dynamic Model.} The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    ACM Class: F.2.2

  3. Fast Parallel Algorithms for Submodular $p$-Superseparable Maximization

    Authors: Philip Cervenjak, Junhao Gan, Anthony Wirth

    Abstract: Maximizing a non-negative, monontone, submodular function $f$ over $n$ elements under a cardinality constraint $k$ (SMCC) is a well-studied NP-hard problem. It has important applications in, e.g., machine learning and influence maximization. Though the theoretical problem admits polynomial-time approximation algorithms, solving it in practice often involves frequently querying submodular functions… ▽ More

    Submitted 2 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 36 pages. To be published in Approximation and Online Algorithms (Proceedings of the 21st International Workshop, WAOA 2023)

    Journal ref: Approximation and Online Algorithms, vol. 14297, p. 219. Springer Nature, 2023

  4. arXiv:2305.16815  [pdf, ps, other

    cs.DS cs.DB

    Sublinear-Space Streaming Algorithms for Estimating Graph Parameters on Sparse Graphs

    Authors: Xiuge Chen, Rajesh Chitnis, Patrick Eades, Anthony Wirth

    Abstract: In this paper, we design sub-linear space streaming algorithms for estimating three fundamental parameters -- maximum independent set, minimum dominating set and maximum matching -- on sparse graph classes, i.e., graphs which satisfy $m=O(n)$ where $m,n$ is the number of edges, vertices respectively. Each of the three graph parameters we consider can have size $Ω(n)$ even on sparse graph classes,… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  5. Maximum Coverage in Sublinear Space, Faster

    Authors: Stephen Jaud, Anthony Wirth, Farhana Choudhury

    Abstract: Given a collection of $m$ sets from a universe $\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions… ▽ More

    Submitted 12 December, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 12 pages, 7 figures

    ACM Class: F.2.3

  6. arXiv:2301.13347  [pdf, other

    cs.CR cs.DB

    Tight Data Access Bounds for Private Top-$k$ Selection

    Authors: Hao Wu, Olga Ohrimenko, Anthony Wirth

    Abstract: We study the top-$k$ selection problem under the differential privacy model: $m$ items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only $O(\sqrt{mk})$ expected accesses: to our… ▽ More

    Submitted 30 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  7. arXiv:2208.07489  [pdf, other

    cs.CR

    Single Round-trip Hierarchical ORAM via Succinct Indices

    Authors: William Holland, Olga Ohrimenko, Anthony Wirth

    Abstract: Access patterns to data stored remotely create a side channel that is known to leak information even if the content of the data is encrypted. To protect against access pattern leakage, Oblivious RAM is a cryptographic primitive that obscures the (actual) access trace at the expense of additional access and periodic shuffling of the server's contents. A class of ORAM solutions, known as Hierarchica… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: 22 pages, 3 Figures, 5 Tables

  8. arXiv:2206.09519  [pdf, other

    cs.CR

    Walking to Hide: Privacy Amplification via Random Message Exchanges in Network

    Authors: Hao Wu, Olga Ohrimenko, Anthony Wirth

    Abstract: The *shuffle model* is a powerful tool to amplify the privacy guarantees of the *local model* of differential privacy. In contrast to the fully decentralized manner of guaranteeing privacy in the local model, the shuffle model requires a central, trusted shuffler. To avoid this central shuffler, recent work of Liew et al. (2022) proposes shuffling locally randomized data in a decentralized manner,… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

  9. arXiv:2112.12279  [pdf, other

    cs.CR

    Randomize the Future: Asymptotically Optimal Locally Private Frequency Estimation Protocol for Longitudinal Data

    Authors: Olga Ohrimenko, Anthony Wirth, Hao Wu

    Abstract: Longitudinal data tracking under Local Differential Privacy (LDP) is a challenging task. Baseline solutions that repeatedly invoke a protocol designed for one-time computation lead to linear decay in the privacy or utility guarantee with respect to the number of computations. To avoid this, the recent approach of Erlingsson et al. (2020) exploits the potential sparsity of user data that changes on… ▽ More

    Submitted 11 April, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  10. arXiv:2108.11549  [pdf, other

    cs.DS

    Dynamic Structural Clustering on Graphs

    Authors: Boyu Ruan, Junhao Gan, Hao Wu, Anthony Wirth

    Abstract: Structural Clustering ($DynClu$) is one of the most popular graph clustering paradigms. In this paper, we consider $StrClu$ under two commonly adapted similarities, namely Jaccard similarity and cosine similarity on a dynamic graph, $G = \langle V, E\rangle$, subject to edge insertions and deletions (updates). The goal is to maintain certain information under updates, so that the $StrClu$ clusteri… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

  11. arXiv:2106.07815  [pdf, ps, other

    cs.DS cs.CR

    Asymptotically Optimal Locally Private Heavy Hitters via Parameterized Sketches

    Authors: Hao Wu, Anthony Wirth

    Abstract: We present two new local differentially private algorithms for frequency estimation. One solves the fundamental frequency oracle problem; the other solves the well-known heavy hitters identification problem. Consistent with prior art, these are randomized algorithms. As a function of failure probability~$β$, the former achieves optimal worst-case estimation error for every~$β$, while the latter is… ▽ More

    Submitted 16 February, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

  12. arXiv:2002.09460  [pdf, other

    cs.DS cs.DM cs.LG cs.SI

    Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs

    Authors: Nate Veldt, Anthony Wirth, David F. Gleich

    Abstract: Motivated by applications in community detection and dense subgraph discovery, we consider new clustering objectives in hypergraphs and bipartite graphs. These objectives are parameterized by one or more resolution parameters in order to enable diverse knowledge discovery in complex data. For both hypergraph and bipartite objectives, we identify parameter regimes that are equivalent to existing… ▽ More

    Submitted 19 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  13. arXiv:1910.06435  [pdf, other

    cs.CC cs.DM

    Graph Clustering in All Parameter Regimes

    Authors: Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, Xin Zhang

    Abstract: Resolution parameters in graph clustering represent a size and quality trade-off. We address the task of efficiently solving a parameterized graph clustering objective for all values of a resolution parameter. Specifically, we consider an objective we call LambdaPrime, involving a parameter $λ\in (0,1)$. This objective is related to other parameterized clustering problems, such as parametric gener… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

  14. arXiv:1903.05246  [pdf, other

    cs.SI cs.LG

    Learning Resolution Parameters for Graph Clustering

    Authors: Nate Veldt, David F. Gleich, Anthony Wirth

    Abstract: Finding clusters of well-connected nodes in a graph is an extensively studied problem in graph-based data analysis. Because of its many applications, a large number of distinct graph clustering objective functions and algorithms have already been proposed and analyzed. To aid practitioners in determining the best clustering approach to use in different applications, we present new techniques for a… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

  15. arXiv:1812.02023  [pdf, ps, other

    cs.DS

    Correlation Clustering in Data Streams

    Authors: Kook ** Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth

    Abstract: Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms need to be both time and and space efficient. In this paper, we address the problem of correlation clustering in the dynamic data stream model. The stream consis… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  16. arXiv:1809.09493  [pdf, ps, other

    cs.CC cs.DS

    Correlation Clustering Generalized

    Authors: David F. Gleich, Nate Veldt, Anthony Wirth

    Abstract: We present new results for LambdaCC and MotifCC, two recently introduced variants of the well-studied correlation clustering problem. Both variants are motivated by applications to network analysis and community detection, and have non-trivial approximation algorithms. We first show that the standard linear programming relaxation of LambdaCC has a $Θ(\log n)$ integrality gap for a certain choice o… ▽ More

    Submitted 25 September, 2018; originally announced September 2018.

  17. arXiv:1806.01678  [pdf, ps, other

    math.NA cs.LG stat.ML

    A Projection Method for Metric-Constrained Optimization

    Authors: Nate Veldt, David Gleich, Anthony Wirth, James Saunderson

    Abstract: We outline a new approach for solving optimization problems which enforce triangle inequalities on output variables. We refer to this as metric-constrained optimization, and give several examples where problems of this form arise in machine learning applications and theoretical approximation algorithms for graph clustering. Although these problem are interesting from a theoretical perspective, the… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

  18. Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

    Authors: Nate Veldt, David Gleich, Anthony Wirth

    Abstract: Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, $λ$, which implicitly controls the size and structur… ▽ More

    Submitted 13 July, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

  19. arXiv:1611.07305  [pdf, other

    cs.LG cs.DS math.NA

    Correlation Clustering with Low-Rank Matrices

    Authors: Nate Veldt, Anthony Wirth, David F. Gleich

    Abstract: Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled 'similar' or 'dissimilar.' Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can b… ▽ More

    Submitted 17 March, 2017; v1 submitted 21 November, 2016; originally announced November 2016.

  20. arXiv:1604.03228  [pdf, other

    cs.DC cs.DS

    Efficient Parallel Algorithms for k-Center Clustering

    Authors: Jessica McClintock, Anthony Wirth

    Abstract: The k-center problem is one of several classic NP-hard clustering questions. For contemporary massive data sets, RAM-based algorithms become impractical. And although there exist good sequential algorithms for k-center, they are not easily parallelizable. In this paper, we design and implement parallel approximation algorithms for this problem. We observe that Gonzalez's greedy algorithm can be… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

  21. Access Time Tradeoffs in Archive Compression

    Authors: Matthias Petri, Alistair Moffat, P. C. Nagesh, Anthony Wirth

    Abstract: Web archives, query and proxy logs, and so on, can all be very large and highly repetitive; and are accessed only sporadically and partially, rather than continually and holistically. This type of data is ideal for compression-based archiving, provided that random-access to small fragments of the original data can be achieved without needing to decompress everything. The recent RLZ (relative Lempe… ▽ More

    Submitted 29 February, 2016; originally announced February 2016.

    Comments: Note that the final published version of this paper prepared by Springer/LNCS introduced errors in the publication process in Figures 1, 2, and 3 that are not present in this preprint. In all other regards the preprint and the published version are identical in their content

    Journal ref: Asia Information Retrieval Societies Conference (AIRS), LNCS vol. 9460, pages 15-28, 2015

  22. arXiv:1507.04645  [pdf, other

    cs.CC cs.DS

    Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover

    Authors: Amit Chakrabarti, Anthony Wirth

    Abstract: Set cover, over a universe of size $n$, may be modelled as a data-streaming problem, where the $m$ sets that comprise the instance are to be read one by one. A semi-streaming algorithm is allowed only $O(n\, \mathrm{poly}\{\log n, \log m\})$ space to process this stream. For each $p \ge 1$, we give a very simple deterministic algorithm that makes $p$ passes over the input stream and returns an app… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

    Comments: 20 pages

    MSC Class: 68Q17; 05B25; 51E30; 68W25 ACM Class: F.2.2; F.2.3; G.2.1

  23. arXiv:1303.6481  [pdf, other

    cs.DS

    Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays

    Authors: Simon Gog, Alistair Moffat, J. Shane Culpepper, Andrew Turpin, Anthony Wirth

    Abstract: The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix pointers. In this paper we describe a new two-level suffix array-based index structure that requires significantly less disk space than previous approaches. Key to… ▽ More

    Submitted 26 March, 2013; originally announced March 2013.

    ACM Class: H.3.1