Skip to main content

Showing 1–38 of 38 results for author: Belazzougui, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09607  [pdf, other

    cs.DS

    Better space-time-robustness trade-offs for set reconciliation

    Authors: Djamal Belazzougui, Gregory Kucherov, Stefan Walzer

    Abstract: We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2103.00462  [pdf, other

    cs.DS

    Weighted Ancestors in Suffix Trees Revisited

    Authors: Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, Rajeev Raman

    Abstract: The weighted ancestor problem is a well-known generalization of the predecessor problem to trees. It is known to require $Ω(\log\log n)$ time for queries provided $O(n\mathop{\mathrm{polylog}} n)$ space is available and weights are from $[0..n]$, where $n$ is the number of tree nodes. However, when applied to suffix trees, the problem, surprisingly, admits an $O(n)$-space solution with constant qu… ▽ More

    Submitted 11 April, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: 15 pages, 5 figures

  3. arXiv:2006.01825  [pdf, ps, other

    cs.DS

    Efficient tree-structured categorical retrieval

    Authors: Djamal Belazzougui, Gregory Kucherov

    Abstract: We study a document retrieval problem in the new framework where $D$ text documents are organized in a {\em category tree} with a pre-defined number $h$ of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern $p$ and a category (level in the category tree), we wish to efficiently retrieve the $t$… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Full version of a paper accepted for presentation at the 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

  4. arXiv:1901.10165  [pdf, other

    cs.DS

    Fully-functional bidirectional Burrows-Wheeler indexes

    Authors: Fabio Cunial, Djamal Belazzougui

    Abstract: Given a string $T$ on an alphabet of size $σ$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|\logσ)$ bits of space, and that supports the addition \emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they cou… ▽ More

    Submitted 9 June, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

  5. arXiv:1805.05228  [pdf, other

    cs.DS

    Assembling Omnitigs using Hidden-Order de Bruijn Graphs

    Authors: Diego Díaz-Domínguez, Djamal Belazzougui, Travis Gagie, Veli Mäkinen, Gonzalo Navarro, Simon J. Puglisi

    Abstract: De novo DNA assembly is a fundamental task in Bioinformatics, and finding Eulerian paths on de Bruijn graphs is one of the dominant approaches to it. In most of the cases, there may be no one order for the de Bruijn graph that works well for assembling all of the reads. For this reason, some de Bruijn-based assemblers try assembling on several graphs of increasing order, in turn. Boucher et al. (2… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

  6. arXiv:1804.04720  [pdf, other

    cs.DS

    Fast Prefix Search in Little Space, with Applications

    Authors: Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, Sebastiano Vigna

    Abstract: It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries---they actually contain the strings in $S$. For very large collections store… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: Presented at the 18th Annual European Symposium on Algorithms (ESA), Liverpool (UK), September 6-8, 2010

  7. arXiv:1707.08197  [pdf, ps, other

    cs.DS

    Fast Label Extraction in the CDAWG

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: The compact directed acyclic word graph (CDAWG) of a string $T$ of length $n$ takes space proportional just to the number $e$ of right extensions of the maximal repeats of $T$, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which $e$ grows significantly more slowly than $n$. We reduce from $O(m\log{\log{n}})$ to $O(m)$ the tim… ▽ More

    Submitted 26 September, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.08640

  8. arXiv:1705.08640  [pdf, other

    cs.DS

    Representing the suffix tree with the CDAWG

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: Given a string $T$, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with $e_T$ arcs, taking overall $O(e_T+e_{\overline{T}})$ words of space, where ${\overline{T}}$ is the reverse of $T$, and supporting some key operations in time between $O(1)$ and $O(\log{\log{n}})$ in the worst case. This representation is especially appealing for highly… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 16 pages, 1 figure. Presented at the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

  9. arXiv:1609.06378  [pdf, ps, other

    cs.DS

    Linear-time string indexing and analysis in small space

    Authors: Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, Veli Mäkinen

    Abstract: The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array (CSA) by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input s… ▽ More

    Submitted 20 September, 2016; originally announced September 2016.

    Comments: Journal submission (52 pages, 2 figures)

  10. Indexing and querying color sets of images

    Authors: Djamal Belazzougui, Roman Kolpakov, Mathieu Raffinot

    Abstract: We aim to study the set of color sets of continuous regions of an image given as a matrix of $m$ rows over $n\geq m$ columns where each element in the matrix is an integer from $[1,σ]$ named a {\em color}. The set of distinct colors in a region is called fingerprint. We aim to compute, index and query the fingerprints of all rectangular regions named rectangles. The set of all such fingerprints… ▽ More

    Submitted 28 August, 2016; originally announced August 2016.

    Comments: 20 pages, 5 figures

  11. arXiv:1608.05699  [pdf, other

    cs.NI

    Memory-efficient and Ultra-fast Network Lookup and Forwarding using Othello Hashing

    Authors: Ye Yu, Djamal Belazzougui, Chen Qian, Qin Zhang

    Abstract: Network algorithms always prefer low memory cost and fast packet processing speed. Forwarding information base (FIB), as a typical network processing component, requires a scalable and memory-efficient algorithm to support fast lookups. In this paper, we present a new network algorithm, Othello Hashing, and its application of a FIB design called Concise, which uses very little memory to support ul… ▽ More

    Submitted 22 November, 2017; v1 submitted 19 August, 2016; originally announced August 2016.

  12. arXiv:1607.04909  [pdf, other

    cs.DS

    Fully Dynamic de Bruijn Graphs

    Authors: Djamal Belazzougui, Travis Gagie, Veli Mäkinen, Marco Previtali

    Abstract: We present a space- and time-efficient fully dynamic implementation de Bruijn graphs, which can also support fixed-length jumbled pattern matching.

    Submitted 19 July, 2016; v1 submitted 17 July, 2016; originally announced July 2016.

    Comments: Presented at the 23rd edition of the International Symposium on String Processing and Information Retrieval (SPIRE 2016)

  13. arXiv:1607.04200  [pdf, other

    cs.DS

    Edit Distance: Sketching, Streaming and Document Exchange

    Authors: Djamal Belazzougui, Qin Zhang

    Abstract: We show that in the document exchange problem, where Alice holds $x \in \{0,1\}^n$ and Bob holds $y \in \{0,1\}^n$, Alice can send Bob a message of size $O(K(\log^2 K+\log n))$ bits such that Bob can recover $x$ using the message and his input $y$ if the edit distance between $x$ and $y$ is no more than $K$, and output "error" otherwise. Both the encoding and decoding can be done in time… ▽ More

    Submitted 14 July, 2016; originally announced July 2016.

    Comments: Full version of an article to be presented at the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016)

  14. arXiv:1606.04495  [pdf, ps, other

    cs.DS

    Range Majorities and Minorities in Arrays

    Authors: Djamal Belazzougui, Travis Gagie, J. Ian Munro, Gonzalo Navarro, Yakov Nekrich

    Abstract: Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks us to preprocess a string of length $n$ such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $τ$. Subsequent authors have reduced their time and space bounds such that, when $τ$ is fixed at preprocess… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1210.1765

  15. arXiv:1604.06002  [pdf, other

    cs.DS

    Practical combinations of repetition-aware data structures

    Authors: Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot

    Abstract: Highly-repetitive collections of strings are increasingly being amassed by genome sequencing and genetic variation experiments, as well as by storing all versions of human-generated files, like webpages and source code. Existing indexes for locating all the exact occurrences of a pattern in a highly-repetitive string take advantage of a single measure of repetition. However, multiple, distinct mea… ▽ More

    Submitted 21 April, 2016; v1 submitted 20 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: text overlap with arXiv:1502.05937

  16. Lempel-Ziv Decoding in External Memory

    Authors: Djamal Belazzougui, Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi

    Abstract: Simple and fast decoding is one of the main advantages of LZ77-type text encoding used in many popular file compressors such as gzip and 7zip. With the recent introduction of external memory algorithms for Lempel-Ziv factorization there is a need for external memory LZ77 decoding but the standard algorithm makes random accesses to the text and cannot be trivially modified for external memory compu… ▽ More

    Submitted 31 January, 2016; originally announced February 2016.

  17. arXiv:1512.05028  [pdf, ps, other

    cs.DS

    Optimal Las Vegas reduction from one-way set reconciliation to error correction

    Authors: Djamal Belazzougui

    Abstract: Suppose we have two players $A$ and $C$, where player $A$ has a string $s[0..u-1]$ and player $C$ has a string $t[0..u-1]$ and none of the two players knows the other's string. Assume that $s$ and $t$ are both over an integer alphabet $[σ]$, where the first string contains $n$ non-zero entries. We would wish to answer to the following basic question. Assuming that $s$ and $t$ differ in at most… ▽ More

    Submitted 15 December, 2015; originally announced December 2015.

    Comments: 14 pages. Under submission to a journal

  18. arXiv:1511.09229  [pdf, ps, other

    cs.DS cs.CC

    Efficient Deterministic Single Round Document Exchange for Edit Distance

    Authors: Djamal Belazzougui

    Abstract: Suppose that we have two parties that possess each a binary string. Suppose that the length of the first string (document) is $n$ and that the two strings (documents) have edit distance (minimal number of deletes, inserts and substitutions needed to transform one string into the other) at most $k$. The problem we want to solve is to devise an efficient protocol in which the first party sends a sin… ▽ More

    Submitted 3 December, 2015; v1 submitted 30 November, 2015; originally announced November 2015.

    Comments: 12 pages, under submission. This version has some minor corrections, clarifications and a simplification of the message size bound

  19. arXiv:1508.02968  [pdf, other

    cs.DS

    Space-efficient detection of unusual words

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: Detecting all the strings that occur in a text more frequently or less frequently than expected according to an IID or a Markov model is a basic problem in string mining, yet current algorithms are based on data structures that are either space-inefficient or incur large slowdowns, and current implementations cannot scale to genomes or metagenomes in practice. In this paper we engineer an algorith… ▽ More

    Submitted 12 August, 2015; originally announced August 2015.

    Comments: arXiv admin note: text overlap with arXiv:1502.06370

  20. arXiv:1507.07080  [pdf, ps, other

    cs.DS

    Range Predecessor and Lempel-Ziv Parsing

    Authors: Djamal Belazzougui, Simon J. Puglisi

    Abstract: The Lempel-Ziv parsing of a string (LZ77 for short) is one of the most important and widely-used algorithmic tools in data compression and string processing. We show that the Lempel-Ziv parsing of a string of length $n$ on an alphabet of size $σ$ can be computed in $O(n\log\logσ)$ time ($O(n)$ time if we allow randomization) using $O(n\logσ)$ bits of working space; that is, using space proportiona… ▽ More

    Submitted 25 July, 2015; originally announced July 2015.

    Comments: 25 pages

  21. arXiv:1502.06370  [pdf, ps, other

    cs.DS

    A framework for space-efficient string kernels

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the $k$-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels w… ▽ More

    Submitted 23 February, 2015; originally announced February 2015.

  22. arXiv:1502.05937  [pdf, other

    cs.DS

    Composite repetition-aware data structures

    Authors: Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot

    Abstract: In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for… ▽ More

    Submitted 23 February, 2015; v1 submitted 20 February, 2015; originally announced February 2015.

    Comments: (the name of the third co-author was inadvertently omitted from previous version)

  23. arXiv:1412.0967  [pdf, other

    cs.DS

    Queries on LZ-Bounded Encodings

    Authors: Djamal Belazzougui, Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Alberto Ordóñez, Simon J. Puglisi, Yasuo Tabei

    Abstract: We describe a data structure that stores a string $S$ in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data structure can be built in a scalable manner and is both small and fast in practice compar… ▽ More

    Submitted 2 December, 2014; originally announced December 2014.

  24. arXiv:1408.5518  [pdf, ps, other

    cs.DS cs.IT

    Faster construction of asymptotically good unit-cost error correcting codes in the RAM model

    Authors: Djamal Belazzougui

    Abstract: Assuming we are in a Word-RAM model with word size $w$, we show that we can construct in $o(w)$ time an error correcting code with a constant relative positive distance that maps numbers of $w$ bits into $Θ(w)$-bit numbers, and such that the application of the error-correcting code on any given number $x\in[0,2^w-1]$ takes constant time. Our result improves on a previously proposed error-correctin… ▽ More

    Submitted 14 September, 2014; v1 submitted 23 August, 2014; originally announced August 2014.

    Comments: Manuscript (5 pages)

  25. arXiv:1408.3093  [pdf, other

    cs.DS

    Rank, select and access in grammar-compressed strings

    Authors: Djamal Belazzougui, Simon J. Puglisi, Yasuo Tabei

    Abstract: Given a string $S$ of length $N$ on a fixed alphabet of $σ$ symbols, a grammar compressor produces a context-free grammar $G$ of size $n$ that generates $S$ and only $S$. In this paper we describe data structures to support the following operations on a grammar-compressed string: $\mbox{rank}_c(S,i)$ (return the number of occurrences of symbol $c$ before position $i$ in $S$);… ▽ More

    Submitted 14 August, 2014; v1 submitted 13 August, 2014; originally announced August 2014.

    Comments: 16 pages

  26. arXiv:1404.4814  [pdf, ps, other

    cs.DS

    Reusing an FM-index

    Authors: Djamal Belazzougui, Travis Gagie, Simon Gog, Giovanni Manzini, Jouni Sirén

    Abstract: Intuitively, if two strings $S_1$ and $S_2$ are sufficiently similar and we already have an FM-index for $S_1$ then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for $S_2$. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems.

    Submitted 9 May, 2014; v1 submitted 18 April, 2014; originally announced April 2014.

  27. arXiv:1401.0936  [pdf, ps, other

    cs.DS

    Linear time construction of compressed text indices in compact space

    Authors: Djamal Belazzougui

    Abstract: We show that the compressed suffix array and the compressed suffix tree for a string of length $n$ over an integer alphabet of size $σ\leq n$ can both be built in $O(n)$ (randomized) time using only $O(n\logσ)$ bits of working space. The previously fastest construction algorithms that used $O(n\logσ)$ bits of space took times $O(n\log\logσ)$ and $O(n\log^εn)$ respectively (where $ε$ is any positiv… ▽ More

    Submitted 23 May, 2016; v1 submitted 5 January, 2014; originally announced January 2014.

    Comments: Expanded version of a paper appeared in proceedings of STOC 2014 conference

  28. arXiv:1312.4678  [pdf, other

    cs.DS cs.DB

    Simple, compact and robust approximate string dictionary

    Authors: Ibrahim Chegrane, Djamal Belazzougui

    Abstract: This paper is concerned with practical implementations of approximate string dictionaries that allow edit errors. In this problem, we have as input a dictionary $D$ of $d$ strings of total length $n$ over an alphabet of size $σ$. Given a bound $k$ and a pattern $x$ of length $m$, a query has to return all the strings of the dictionary which are at edit distance at most $k$ from $x$, where the edit… ▽ More

    Submitted 22 August, 2014; v1 submitted 17 December, 2013; originally announced December 2013.

    Comments: Accepted to a journal (19 pages, 2 figures)

  29. arXiv:1312.0526  [pdf, other

    cs.DS

    Cache-Oblivious Peeling of Random Hypergraphs

    Authors: Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, Sebastiano Vigna

    Abstract: The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random $r$-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available i… ▽ More

    Submitted 2 December, 2013; originally announced December 2013.

  30. arXiv:1301.4952  [pdf, other

    cs.DS

    Single and multiple consecutive permutation motif search

    Authors: Djamal Belazzougui, Adeline Pierrot, Mathieu Raffinot, Stéphane Vialette

    Abstract: Let $t$ be a permutation (that shall play the role of the {\em text}) on $[n]$ and a pattern $p$ be a sequence of $m$ distinct integer(s) of $[n]$, $m\leq n$. The pattern $p$ occurs in $t$ in position $i$ if and only if $p_1... p_m$ is order-isomorphic to $t_i... t_{i+m-1}$, that is, for all $1 \leq k< \ell \leq m$, $p_k>p_\ell$ if and only if $t_{i+k-1}>t_{i+\ell-1}$. Searching for a pattern $p$… ▽ More

    Submitted 25 April, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

  31. arXiv:1301.3488  [pdf, other

    cs.DS cs.DM cs.IR

    Various improvements to text fingerprinting

    Authors: Djamal Belazzougui, Roman Kolpakov, Mathieu Raffinot

    Abstract: Let s = s_1 .. s_n be a text (or sequence) on a finite alphabet Σof size σ. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set {\cal F} of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s_i .. s_j is a maximal location for a fingerprint f in F (… ▽ More

    Submitted 15 January, 2013; originally announced January 2013.

  32. arXiv:1210.1765  [pdf, ps, other

    cs.DS

    Better Space Bounds for Parameterized Range Majority and Minority

    Authors: Djamal Belazzougui, Travis Gagie, Gonzalo Navarro

    Abstract: Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks to preprocess a string of length $n$ such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $τ$. Subsequent authors have reduced their time and space bounds such that, when $τ$ is given at preprocessing… ▽ More

    Submitted 13 July, 2014; v1 submitted 5 October, 2012; originally announced October 2012.

  33. arXiv:1209.5441  [pdf, ps, other

    cs.DS

    Predecessor search with distance-sensitive query time

    Authors: Djamal Belazzougui, Paolo Boldi, Sebastiano Vigna

    Abstract: A predecessor (successor) search finds the largest element $x^-$ smaller than the input string $x$ (the smallest element $x^+$ larger than or equal to $x$, respectively) out of a given set $S$; in this paper, we consider the static case (i.e., $S$ is fixed and does not change over time) and assume that the $n$ elements of $S$ are available for inspection. We present a number of algorithms that, wi… ▽ More

    Submitted 24 September, 2012; originally announced September 2012.

  34. arXiv:1111.2621  [pdf, other

    cs.DS

    Optimal Lower and Upper Bounds for Representing Sequences

    Authors: Djamal Belazzougui, Gonzalo Navarro

    Abstract: Sequence representations supporting queries $access$, $select$ and $rank$ are at the core of many data structures. There is a considerable gap between the various upper bounds and the few lower bounds known for such representations, and how they relate to the space used. In this article we prove a strong lower bound for $rank$, which holds for rather permissive assumptions on the space used, and g… ▽ More

    Submitted 23 August, 2013; v1 submitted 10 November, 2011; originally announced November 2011.

  35. arXiv:1104.4353  [pdf, ps, other

    cs.DS

    Random input helps searching predecessors

    Authors: D. Belazzougui, A. C. Kaporis, P. G. Spirakis

    Abstract: We solve the dynamic Predecessor Problem with high probability (whp) in constant time, using only $n^{1+δ}$ bits of memory, for any constant $δ> 0$. The input keys are random wrt a wider class of the well studied and practically important class of $(f_1, f_2)$-smooth distributions introduced in \cite{and:mat}. It achieves O(1) whp amortized time. Its worst-case time is… ▽ More

    Submitted 21 April, 2011; originally announced April 2011.

    ACM Class: F.2.2

  36. arXiv:1103.2167  [pdf, other

    cs.DS

    Improved space-time tradeoffs for approximate full-text indexing with one edit error

    Authors: Djamal Belazzougui

    Abstract: In this paper we are interested in indexing texts for substring matching queries with one edit error. That is, given a text $T$ of $n$ characters over an alphabet of size $σ$, we are asked to build a data structure that answers the following query: find all the $occ$ substrings of the text that are at edit distance at most $1$ from a given string $q$ of length $m$. In this paper we show two new re… ▽ More

    Submitted 21 August, 2014; v1 submitted 10 March, 2011; originally announced March 2011.

    Comments: Accepted for publication in a journal (28 pages)

  37. Worst case efficient single and multiple string matching in the Word-RAM model

    Authors: Djamal Belazzougui

    Abstract: In this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size sigma such that we can answer to the following query: given a text T of length n, where each character is encoded using log(sigma) bits r… ▽ More

    Submitted 14 January, 2011; v1 submitted 15 November, 2010; originally announced November 2010.

    Comments: Full version of an extended abstract presented at IWOCA 2010 conference

  38. Succinct Dictionary Matching With No Slowdown

    Authors: Djamal Belazzougui

    Abstract: The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences… ▽ More

    Submitted 14 February, 2010; v1 submitted 16 January, 2010; originally announced January 2010.

    Comments: Corrected typos and other minor errors