Showing 1–2 of 2 results for author: Depuydt, L

Search v0.5.6 released 2020-02-24

arXiv:2312.01359 [pdf, other]

cs.DS

Suffixient Sets

Authors: Lore Depuydt, Travis Gagie, Ben Langmead, Giovanni Manzini, Nicola Prezza

Abstract: We define a suffixient set for a text $T [1..n]$ to be a set $S$ of positions between 1 and $n$ such that, for any edge descending from a node $u$ to a node $v$ in the suffix tree of $T$, there is an element $s \in S$ such that $u$'s path label is a suffix of $T [1..s - 1]$ and $T [s]$ is the first character of $(u, v)$'s edge label. We first show there is a suffixient set of cardinality at most… ▽ More We define a suffixient set for a text $T [1..n]$ to be a set $S$ of positions between 1 and $n$ such that, for any edge descending from a node $u$ to a node $v$ in the suffix tree of $T$, there is an element $s \in S$ such that $u$'s path label is a suffix of $T [1..s - 1]$ and $T [s]$ is the first character of $(u, v)$'s edge label. We first show there is a suffixient set of cardinality at most $2 \bar{r}$, where $\bar{r}$ is the number of runs in the Burrows-Wheeler Transform of the reverse of $T$. We then show that, given a straight-line program for $T$ with $g$ rules, we can build an $O (\bar{r} + g)$-space index with which, given a pattern $P [1..m]$, we can find the maximal exact matches (MEMs) of $P$ with respect to $T$ in $O (m \log (σ) / \log n + d \log n)$ time, where $σ$ is the size of the alphabet and $d$ is the number of times we would fully or partially descend edges in the suffix tree of $T$ while finding those MEMs. △ Less

Submitted 4 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.
arXiv:2311.04538 [pdf, ps, other]

cs.DS

Faster Maximal Exact Matches with Lazy LCP Evaluation

Authors: Adrián Goga, Lore Depuydt, Nathaniel K. Brown, Jan Fostier, Travis Gagie, Gonzalo Navarro

Abstract: MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the ope… ▽ More MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Search v0.5.6 released 2020-02-24