Skip to main content

Showing 1–8 of 8 results for author: Cenzato, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14235  [pdf, other

    cs.DS

    Computing the LCP Array of a Labeled Graph

    Authors: Jarno Alanko, Davide Cenzato, Nicola Cotumaccio, Sung-Hwan Kim, Giovanni Manzini, Nicola Prezza

    Abstract: The LCP array is an important tool in stringology, allowing to speed up pattern matching algorithms and enabling compact representations of the suffix tree. Recently, Conte et al. [DCC 2023] and Cotumaccio et al. [SPIRE 2023] extended the definition of this array to Wheeler DFAs and, ultimately, to arbitrary labeled graphs, proving that it can be used to efficiently solve matching statistics queri… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  2. arXiv:2310.17980  [pdf, other

    cs.DS

    Sketching and Streaming for Dictionary Compression

    Authors: Ruben Becker, Matteo Canton, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Nicola Prezza

    Abstract: We initiate the study of sub-linear sketching and streaming techniques for estimating the output size of common dictionary compressors such as Lempel-Ziv '77, the run-length Burrows-Wheeler transform, and grammar compression. To this end, we focus on a measure that has recently gained much attention in the information-theoretic community and which approximates up to a polylogarithmic multiplicativ… ▽ More

    Submitted 9 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  3. arXiv:2307.07267  [pdf, other

    cs.DS

    Random Wheeler Automata

    Authors: Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Riccardo Maso, Nicola Prezza

    Abstract: Wheeler automata were introduced in 2017 as a tool to generalize existing indexing and compression techniques based on the Burrows-Wheeler transform. Intuitively, an automaton is said to be Wheeler if there exists a total order on its states reflecting the co-lexicographic order of the strings labeling the automaton's paths; this property makes it possible to represent the automaton's topology in… ▽ More

    Submitted 7 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: 17 pages, 3 figures

  4. arXiv:2306.04737  [pdf, other

    cs.FL

    Optimal Wheeler Language Recognition

    Authors: Ruben Becker, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Alberto Policriti, Nicola Prezza

    Abstract: A Wheeler automaton is a finite state automaton whose states admit a total Wheeler order, reflecting the co-lexicographic order of the strings labeling source-to-node paths. A Wheeler language is a regular language admitting an accepting Wheeler automaton. Wheeler languages admit efficient and elegant solutions to hard problems such as automata compression and regular expression matching, therefor… ▽ More

    Submitted 18 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  5. arXiv:2305.05129  [pdf, other

    cs.DS

    Sorting Finite Automata via Partition Refinement

    Authors: Ruben Becker, Manuel Cáceres, Davide Cenzato, Sung-Hwan Kim, Bojana Kodric, Francisco Olivares, Nicola Prezza

    Abstract: Wheeler nondeterministic finite automata (WNFAs) were introduced as a generalization of prefix sorting from strings to labeled graphs. WNFAs admit optimal solutions to classic hard problems on labeled graphs and languages. The problem of deciding whether a given NFA is Wheeler is known to be NP-complete. Recently, however, Alanko et al. showed how to side-step this complexity by switching to preor… ▽ More

    Submitted 18 December, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  6. arXiv:2212.01156  [pdf, other

    cs.DS

    Computing the optimal BWT of very large string collections

    Authors: Davide Cenzato, Veronica Guerrini, Zsuzsanna Lipták, Giovanna Rosone

    Abstract: It is known that the exact form of the Burrows-Wheeler-Transform (BWT) of a string collection depends, in most implementations, on the input order of the strings in the collection. Reordering strings of an input collection affects the number of equal-letter runs $r$, arguably the most important parameter of BWT-based data structures, such as the FM-index or the $r$-index. Bentley, Gibney, and Than… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 11 pages, 2 figures, 4 tables

  7. arXiv:2202.13235  [pdf, other

    cs.DS

    A survey of BWT variants for string collections

    Authors: Davide Cenzato, Zsuzsanna Lipták

    Abstract: In recent years, the focus of bioinformatics research has moved from individual sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform (BWT) in string processing, a number of dedicated tools have been developed for computing the BWT of string collections. While the focus has been on improving efficiency, both in space and time, the exact definition of th… ▽ More

    Submitted 16 November, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: 34 pages, 4 figures

  8. arXiv:2106.11191  [pdf, other

    cs.DS

    Computing the original eBWT faster, simpler, and with less memory

    Authors: Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, Marinella Sciortino

    Abstract: Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algo… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 20 pages, 5 figures, 1 table