Skip to main content

Showing 1–9 of 9 results for author: Claude, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2004.01032  [pdf, other

    cs.DS

    Grammar-Compressed Indexes with Logarithmic Search Time

    Authors: Francisco Claude, Gonzalo Navarro, Alejandro Pacheco

    Abstract: Let a text $T[1..n]$ be the only string generated by a context-free grammar with $g$ (terminal and nonterminal) symbols, and of size $G$ (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of $T$, can be encoded using essentially $G\lg g$ bits. We introduce the first grammar-compressed index that uses $O(G\lg n)$ bit… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1110.4493

  2. On the Reproducibility of Experiments of Indexing Repetitive Document Collections

    Authors: Antonio Fariña, Miguel A. Martínez-Prieto, Francisco Claude, Gonzalo Navarro, Juan J. Lastra-Díaz, Nicola Prezza, Diego Seco

    Abstract: This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work [5]. In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

    Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Replication framework available at: https://github.com/migumar2/uiHRDC/

    Journal ref: Information Systems; Volume 83, July 2019; pages 181-194

  3. Universal Indexes for Highly Repetitive Document Collections

    Authors: Francisco Claude, Antonio Fariña, Miguel A. Martínez-Prieto, Gonzalo Navarro

    Abstract: Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We intr… ▽ More

    Submitted 23 May, 2016; v1 submitted 29 April, 2016; originally announced April 2016.

    Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

    Journal ref: Information Systems, Volume 61, Pages 1-23, 2016

  4. arXiv:1405.1220  [pdf, other

    cs.DS

    Efficient Compressed Wavelet Trees over Large Alphabets

    Authors: Francisco Claude, Gonzalo Navarro, Alberto Ordóñez

    Abstract: The {\em wavelet tree} is a flexible data structure that permits representing sequences $S[1,n]$ of symbols over an alphabet of size $σ$, within compressed space and supporting a wide range of operations on $S$. When $σ$ is significant compared to $n$, current wavelet tree representations incur in noticeable space or time overheads. In this article we introduce the {\em wavelet matrix}, an alterna… ▽ More

    Submitted 6 May, 2014; originally announced May 2014.

  5. arXiv:1201.3602  [pdf, other

    cs.DS

    Compact Binary Relation Representations with Rich Functionality

    Authors: Jérémy Barbay, Francisco Claude, Gonzalo Navarro

    Abstract: Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify r… ▽ More

    Submitted 17 January, 2012; originally announced January 2012.

    Comments: 32 pages

  6. arXiv:1110.4493  [pdf, other

    cs.DS

    Improved Grammar-Based Compressed Indexes

    Authors: Francisco Claude, Gonzalo Navarro

    Abstract: We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text $T[1..u]$ that is represented by a (context-free) grammar of $n$ (terminal and nonterminal) symbols and size $N$ (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of… ▽ More

    Submitted 20 October, 2011; originally announced October 2011.

  7. arXiv:0911.4981  [pdf, other

    cs.DS

    Efficient Fully-Compressed Sequence Representations

    Authors: Jeremy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, Yakov Nekrich

    Abstract: We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..σ]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh{\lg\lgσ}$ and average time $\Oh{\lg \Ho(s)}$. The worst-cas… ▽ More

    Submitted 1 April, 2012; v1 submitted 25 November, 2009; originally announced November 2009.

  8. arXiv:0911.3318  [pdf, ps, other

    cs.IR cs.DS

    Re-Pair Compression of Inverted Lists

    Authors: Francisco Claude, Antonio Farina, Gonzalo Navarro

    Abstract: Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions with techniques that favor small numbers. In this paper we explore a completely different alternative: We use Re-Pair compression of those differences. While Re-Pair by itself offers fast decompressio… ▽ More

    Submitted 17 November, 2009; originally announced November 2009.

  9. arXiv:0907.2089  [pdf, other

    cs.DB cs.IR

    Fast In-Memory XPath Search over Compressed Text and Tree Indexes

    Authors: A. Arroyuelo, F. Claude, S. Maneth, V. Mäkinen, G. Navarro, K. Nguyen, J. Siren, N. Välimäki

    Abstract: A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can efficiently be implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts of querying the text of the document, plus some parts of querying the tree structure.… ▽ More

    Submitted 5 October, 2011; v1 submitted 12 July, 2009; originally announced July 2009.