Skip to main content

Showing 1–23 of 23 results for author: Lipták, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.10426  [pdf, ps, other

    cs.DS cs.DM

    Bit catastrophes for the Burrows-Wheeler Transform

    Authors: Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, Cristian Urbina

    Abstract: A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-lette… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: This work is an extended version of our conference article with the same title, published in the proceedings of DLT 2023

  2. arXiv:2403.13162  [pdf, other

    cs.DS

    A Textbook Solution for Dynamic Strings

    Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

    Abstract: We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time $\mathcal{O}(\log N)$ whp for the updates and $\mathcal{O}(1)$ worst-case time for the queries, where $N$ is the total collection s… ▽ More

    Submitted 6 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted at ESA 2024 - Track S

  3. arXiv:2403.09893  [pdf, other

    cs.DS

    BAT-LZ Out of Hell

    Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

    Abstract: Despite consistently yielding the best compression on repetitive text collections, the Lempel-Ziv parsing has resisted all attempts at offering relevant guarantees on the cost to access an arbitrary symbol. This makes it less attractive for use on compressed self-indexes and other compressed data structures. In this paper we introduce a variant we call BAT-LZ (for Bounded Access Time Lempel-Ziv) w… ▽ More

    Submitted 23 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted at CPM2024

  4. arXiv:2306.04470  [pdf, other

    cs.DS

    Maintaining the cycle structure of dynamic permutations

    Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

    Abstract: We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $π$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $π$, using the position within the cycle as the key. Updating $π$ to $τ\cdotπ$,… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  5. arXiv:2212.01156  [pdf, other

    cs.DS

    Computing the optimal BWT of very large string collections

    Authors: Davide Cenzato, Veronica Guerrini, Zsuzsanna Lipták, Giovanna Rosone

    Abstract: It is known that the exact form of the Burrows-Wheeler-Transform (BWT) of a string collection depends, in most implementations, on the input order of the strings in the collection. Reordering strings of an input collection affects the number of equal-letter runs $r$, arguably the most important parameter of BWT-based data structures, such as the FM-index or the $r$-index. Bentley, Gibney, and Than… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 11 pages, 2 figures, 4 tables

  6. Suffix sorting via matching statistics

    Authors: Zsuzsanna Lipták, Francesco Masillo, Simon J. Puglisi

    Abstract: We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the ge… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 16 pages, 4 figures; accepted at WABI 2022 (Workshop on Algorithms in Bioinformatics, Sept. 5-9, 2022, Potsdam, Germany)

    Journal ref: Lipták, Zs., Masillo, F. & Puglisi, S.J. Suffix sorting via matching statistics. Algorithms Mol Biol 19, 11 (2024)

  7. arXiv:2202.13235  [pdf, other

    cs.DS

    A survey of BWT variants for string collections

    Authors: Davide Cenzato, Zsuzsanna Lipták

    Abstract: In recent years, the focus of bioinformatics research has moved from individual sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform (BWT) in string processing, a number of dedicated tools have been developed for computing the BWT of string collections. While the focus has been on improving efficiency, both in space and time, the exact definition of th… ▽ More

    Submitted 16 November, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: 34 pages, 4 figures

  8. arXiv:2106.11191  [pdf, other

    cs.DS

    Computing the original eBWT faster, simpler, and with less memory

    Authors: Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, Marinella Sciortino

    Abstract: Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings, however, since this introduction, it has been used more generally to describe any BWT of a collection of strings and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algo… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 20 pages, 5 figures, 1 table

  9. Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

    Authors: Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Nicola Prezza, Marinella Sciortino, Anna Toffanello

    Abstract: The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its nu… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: 14 pages, 2 figues

    Report number: 47th Int. Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2021), LNCS 12607: 249--262 (2021)

  10. Pattern Discovery in Colored Strings

    Authors: Zsuzsanna Lipták, Simon J. Puglisi, Massimiliano Rossi

    Abstract: In this paper, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, ass… ▽ More

    Submitted 28 May, 2021; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: 22 pages, 5 figures, 2 tables, published in ACM Journal of Experimental Algorithmics. This is the journal version of the paper with the same title at SEA 2020 (18th Symposium on Experimental Algorithms, Catania, Italy, June 16-18, 2020)

    Journal ref: Zs. Lipták, Simon J. Puglisi, Massimiliano Rossi: Pattern Discovery in Colored Strings. ACM Journal of Experimental Algorithmics, Vol. 26, 1.1:1-1.1:26 (2021)

  11. Generating a Gray code for prefix normal words in amortized polylogarithmic time per word

    Authors: Péter Burcsi, Gabriele Fici, Zsuzsanna Lipták, Rajeev Raman, Joe Sawada

    Abstract: A prefix normal word is a binary word with the property that no substring has more $1$s than the prefix of the same length. By proving that the set of prefix normal words is a bubble language, we can exhaustively list all prefix normal words of length $n$ as a combinatorial Gray code, where successive strings differ by at most two swaps or bit flips. This Gray code can be generated in… ▽ More

    Submitted 7 August, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: To appear in Theoretical Computer Science. arXiv admin note: text overlap with arXiv:1401.6346

    Journal ref: P. Burcsi, G. Fici, Zs. Lipták, R. Raman, J. Sawada: Generating a Gray code for prefix normal words in amortized polylogarithmic time per word. Theor. Comput. Sci. 842: 86-99 (2020)

  12. When a Dollar Makes a BWT

    Authors: Sara Giuliani, Zsuzsanna Lipták, Francesco Masillo, Romeo Rizzi

    Abstract: The Burrows-Wheeler-Transform (BWT) is a reversible string transformation which plays a central role in text compression and is fundamental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every str… ▽ More

    Submitted 12 March, 2021; v1 submitted 24 August, 2019; originally announced August 2019.

    Comments: This is the journal version of paper at ICTCS 2019 (20th Italian Conference on Theoretical Computer Science, 9-11 Sept. 2019, Como, Italy). Journal version appeared in TCS 2021

    Journal ref: Theoretical Computer Science 857: 123-146 (2021)

  13. On Infinite Prefix Normal Words

    Authors: Ferdinando Cicalese, Zsuzsanna Lipták, Massimiliano Rossi

    Abstract: Prefix normal words are binary words that have no factor with more $1$s than the prefix of the same length. Finite prefix normal words were introduced in [Fici and Lipták, DLT 2011]. In this paper, we study infinite prefix normal words and explore their relationship to some known classes of infinite binary words. In particular, we establish a connection between prefix normal words and Sturmian wor… ▽ More

    Submitted 28 May, 2021; v1 submitted 15 November, 2018; originally announced November 2018.

    Comments: 22 pages, 5 figures, 1 Table, accepted in Theoret. Comp. Sc.. This is the journal version of the paper with the same title at accepted at SOFSEM 2019 (45th International Conference on Current Trends in Theory and Practice of Computer Science, Nový Smokovec, Slovakia, January 27-30, 2019)

  14. On Prefix Normal Words

    Authors: Gabriele Fici, Zsuzsanna Lipták

    Abstract: We present a new class of binary words: the prefix normal words. They are defined by the property that for any given length $k$, no factor of length $k$ has more $a$'s than the prefix of the same length. These words arise in the context of indexing for jumbled pattern matching (a.k.a. permutation matching or Parikh vector matching), where the aim is to decide whether a string has a factor with a g… ▽ More

    Submitted 31 May, 2018; originally announced May 2018.

    Comments: Published in the Proceedings of DLT 2011

    Journal ref: G. Mauri and A. Leporati (Eds.): DLT 2011, LNCS 6795, pp. 228--238, 2011

  15. Bubble-Flip -- A New Generation Algorithm for Prefix Normal Words

    Authors: Ferdinando Cicalese, Zsuzsanna Lipták, Massimiliano Rossi

    Abstract: We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings, which exploit certain properties of prefix normal words in a smart way. We introduce infinite prefix normal words and show that one of the operations used by the… ▽ More

    Submitted 26 July, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

    Comments: 30 pages, 3 figures, accepted in Theoret. Comp. Sc.. This is the journal version of the paper with the same title at LATA 2018 (12th International Conference on Language and Automata Theory and Applications, Tel Aviv, April 9-11, 2018)

    Journal ref: Theor. Comput. Sci. 743: 38-52 (2018)

  16. arXiv:1711.06264  [pdf, other

    cs.DM

    On the Parikh-de-Bruijn grid

    Authors: Péter Burcsi, Zsuzsanna Lipták, W. F. Smyth

    Abstract: We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.

    Submitted 16 November, 2017; originally announced November 2017.

    Comments: 18 pages, 3 figures, 1 table

    MSC Class: 68R15 (Primary); 68W32 (Secondary)

  17. arXiv:1611.09017  [pdf, other

    cs.DM cs.FL math.CO

    On Prefix Normal Words and Prefix Normal Forms

    Authors: Péter Burcsi, Gabriele Fici, Zsuzsanna Lipták, Frank Ruskey, Joe Sawada

    Abstract: A $1$-prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associat… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: To appear in Theoretical Computer Science

    Journal ref: Theoretical Computer Science, 659: 1-13, 2017

  18. arXiv:1404.2824  [pdf, other

    cs.FL cs.DM cs.DS math.CO

    Normal, Abby Normal, Prefix Normal

    Authors: Péter Burcsi, Gabriele Fici, Zsuzsanna Lipták, Frank Ruskey, Joe Sawada

    Abstract: A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number $pnw(n)$ of prefix normal words of length $n$, showing that $pnw(n) =Ω\left(2^{n - c\sqrt{n\ln n}}\right)$ for some $c$ and… ▽ More

    Submitted 1 April, 2014; originally announced April 2014.

    Comments: Accepted at FUN '14

    Journal ref: LNCS 8496, pages 74-88 (2014)

  19. On Combinatorial Generation of Prefix Normal Words

    Authors: Péter Burcsi, Gabriele Fici, Zsuzsanna Lipták, Frank Ruskey, Joe Sawada

    Abstract: A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present an efficient algorithm for exhaustively listing the prefix normal words with a fixed length. The algorithm is based on the fact that the language of prefix normal words… ▽ More

    Submitted 24 January, 2014; originally announced January 2014.

    Comments: 12 pages, 5 figures

    Journal ref: Combinatorial Pattern Matching 2014, LNCS 8464, 60-69

  20. arXiv:1305.6395  [pdf, ps, other

    cs.FL math.CO

    On the Number of Closed Factors in a Word

    Authors: Golnaz Badkobeh, Gabriele Fici, Zsuzsanna Lipták

    Abstract: A closed word (a.k.a. periodic-like word or complete first return) is a word whose longest border does not have internal occurrences, or, equivalently, whose longest repeated prefix is not right special. We investigate the structure of closed factors of words. We show that a word of length $n$ contains at least $n+1$ distinct closed factors, and characterize those words having exactly $n+1$ closed… ▽ More

    Submitted 1 December, 2014; v1 submitted 28 May, 2013; originally announced May 2013.

    Comments: Accepted to LATA 2015

    MSC Class: 68R15

  21. arXiv:1304.5560  [pdf, ps, other

    cs.DS

    Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

    Authors: Ferdinando Cicalese, Travis Gagie, Emanuele Giaquinta, Eduardo Sany Laber, Zsuzsanna Lipták, Romeo Rizzi, Alexandru I. Tomescu

    Abstract: We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

    Submitted 19 April, 2013; originally announced April 2013.

  22. Binary Jumbled String Matching for Highly Run-Length Compressible Texts

    Authors: Golnaz Badkobeh, Gabriele Fici, Steve Kroon, Zsuzsanna Lipták

    Abstract: The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for cons… ▽ More

    Submitted 31 May, 2013; v1 submitted 12 June, 2012; originally announced June 2012.

    Comments: v2: only small cosmetic changes; v3: new title, weakened conjectures on size of Corner Index (we no longer conjecture it to be always linear in size of RLE); removed experimental part on random strings (these are valid but limited in their predictive power w.r.t. general strings); v3 published in IPL

    MSC Class: 68W32; 68P05; 68P20 ACM Class: G.2.1

    Journal ref: Information Processing Letters, 113: 604-608 (2013)

  23. Algorithms for Jumbled Pattern Matching in Strings

    Authors: Péter Burcsi, Ferdinando Cicalese, Gabriele Fici, Zsuzsanna Lipták

    Abstract: The Parikh vector p(s) of a string s is defined as the vector of multiplicities of the characters. Parikh vector q occurs in s if s has a substring t with p(t)=q. We present two novel algorithms for searching for a query q in a text s. One solves the decision problem over a binary text in constant time, using a linear size index of the text. The second algorithm, for a general finite alphabet, fin… ▽ More

    Submitted 8 February, 2011; originally announced February 2011.

    Comments: 18 pages, 9 figures; article accepted for publication in the International Journal of Foundations of Computer Science

    ACM Class: F.2.2; J.3

    Journal ref: Int. J. Found. Comput. Sci. 23(2): 357-374 (2012)