Skip to main content

Showing 1–13 of 13 results for author: Louza, F A

Searching in archive cs. Search in all archives.
.
  1. Lossy Compressor preserving variant calling through Extended BWT

    Authors: Veronica Guerrini, Felipe A. Louza, Giovanna Rosone

    Abstract: A standard format used for storing the output of high-throughput sequencing experiments is the FASTQ format. It comprises three main components: (i) headers, (ii) bases (nucleotide sequences), and (iii) quality scores. FASTQ files are widely used for variant calling, where sequencing data are mapped into a reference genome to discover variants that may be used for further analysis. There are many… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies

  2. arXiv:2108.04988  [pdf, other

    cs.DS

    Practical evaluation of Lyndon factors via alphabet reordering

    Authors: Marcelo K. Albertini, Felipe A. Louza

    Abstract: We evaluate the influence of different alphabet orderings on the Lyndon factorization of a string. Experiments with Pizza & Chili datasets show that for most alphabet reorderings, the number of Lyndon factors is usually small, and the length of the longest Lyndon factor can be as large as the input string, which is unfavorable for algorithms and indexes that depend on the number of Lyndon factors.… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

  3. A New Approach to Regular & Indeterminate Strings

    Authors: Felipe A. Louza, Neerja Mhaskar, W. F. Smyth

    Abstract: In this paper we propose a new, more appropriate definition of regular and indeterminate strings. A regular string is one that is "isomorphic" to a string whose entries all consist of a single letter, but which nevertheless may itself include entries containing multiple letters. A string that is not regular is said to be indeterminate. We begin by proposing a new model for the representation of st… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to TCS

  4. arXiv:2011.12898  [pdf, other

    cs.DS

    Grammar Compression By Induced Suffix Sorting

    Authors: Daniel S. N. Nunes, Felipe A. Louza, Simon Gog, Mauricio Ayala-Rincón, Gonzalo Navarro

    Abstract: A grammar compression algorithm, called GCIS, is introduced in this work. GCIS is based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. The proposed solution builds on the factorization performed by SAIS during suffix sorting. A context-free grammar is used to replace factors by non-terminals. The algorithm is then recursively applied on the shorter sequence of non-… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  5. arXiv:2009.03675  [pdf, other

    cs.DS

    Space efficient merging of de Bruijn graphs and Wheeler graphs

    Authors: Lavinia Egidi, Felipe A. Louza, Giovanni Manzini

    Abstract: The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of the art algorithm for the same problem but it uses less than half of its working space. A… ▽ More

    Submitted 12 July, 2021; v1 submitted 5 September, 2020; originally announced September 2020.

    Comments: 24 pages, 10 figures. arXiv admin note: text overlap with arXiv:1902.02889

  6. Inducing the Lyndon Array

    Authors: Felipe A. Louza, Sabrina Mantaci, Giovanni Manzini, Marinella Sciortino, Guilherme P. Telles

    Abstract: In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $σ+ O(1)$ words of working space, where $n$ is the length of the text and $σ$ is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. I… ▽ More

    Submitted 26 July, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Accepted to SPIRE'19

  7. Algorithms to compute the Burrows-Wheeler Similarity Distribution

    Authors: Felipe A. Louza, Guilherme P. Telles, Simon Gog, Liang Zhao

    Abstract: The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more their symbols are intermixed in the transformation, the more the strings are similar. In this article we present two new algorithms to compute similarity measures… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: Accepted to TCS

  8. Space-efficient merging of succinct de Bruijn graphs

    Authors: Lavinia Egidi, Felipe A. Louza, Giovanni Manzini

    Abstract: We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, Bioinformatics 2019],… ▽ More

    Submitted 26 July, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

    Comments: Accepted to SPIRE'19

  9. A Simple Algorithm for Computing the Document Array

    Authors: Felipe A. Louza

    Abstract: We present a simple algorithm for computing the document array given a string collection and its suffix array as input. Our algorithm runs in linear time using constant additional space for strings from constant alphabets.

    Submitted 2 November, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

  10. External memory BWT and LCP computation for sequence collections with applications

    Authors: Lavinia Egidi, Felipe A. Louza, Giovanni Manzini, Guilherme P. Telles

    Abstract: We propose an external memory algorithm for the computation of the BWT and LCP array for a collection of sequences. Our algorithm takes the amount of available memory as an input parameter, and tries to make the best use of it by splitting the input collection into subcollections sufficiently small that it can compute their BWT in RAM using an optimal linear time algorithm. Next, it merges the par… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

  11. arXiv:1711.03205  [pdf, other

    cs.DS

    A Grammar Compression Algorithm based on Induced Suffix Sorting

    Authors: Daniel Saad Nogueira Nunes, Felipe A. Louza, Simon Gog, Mauricio Ayala-Rincón, Gonzalo Navarro

    Abstract: We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, introduced by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its correspondent factor. The resulting… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

  12. Lyndon Array Construction during Burrows-Wheeler Inversion

    Authors: Felipe A. Louza, W. F. Smyth, Giovanni Manzini, Guilherme P. Telles

    Abstract: In this paper we present an algorithm to compute the Lyndon array of a string $T$ of length $n$ as a byproduct of the inversion of the Burrows-Wheeler transform of $T$. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

    Journal ref: Journal of Discrete Algorithms, 50 (2018), 2-9

  13. Burrows-Wheeler transform and LCP array construction in constant space

    Authors: Felipe A. Louza, Travis Gagie, Guilherme P. Telles

    Abstract: In this article we extend the elegant in-place Burrows-Wheeler transform (BWT) algorithm proposed by Crochemore et al. (Crochemore et al., 2015). Our extension is twofold: we first show how to compute simultaneously the longest common prefix (LCP) array as well as the BWT, using constant additional space; we then show how to build the LCP array directly in compressed representation using Elias cod… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

    Comments: Accepted to JDA

    Journal ref: Journal of Discrete Algorithms, 42 (2017) 14-22