Representing the suffix tree with the CDAWG

Belazzougui, Djamal; Cunial, Fabio

Computer Science > Data Structures and Algorithms

arXiv:1705.08640 (cs)

[Submitted on 24 May 2017]

Title:Representing the suffix tree with the CDAWG

Authors:Djamal Belazzougui, Fabio Cunial

View PDF

Abstract:Given a string $T$, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with $e_T$ arcs, taking overall $O(e_T+e_{\overline{T}})$ words of space, where ${\overline{T}}$ is the reverse of $T$, and supporting some key operations in time between $O(1)$ and $O(\log{\log{n}})$ in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which $e_T$ grows sublinearly in the length of $T$ in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between $O(1)$ and $O(\log{n})$ in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of $T$ itself, that takes $O(e_T)$ words of space, and that supports random access in $O(\log{n})$ time. Furthermore, we establish a connection between the reversed CDAWG of $T$ and a context-free grammar that produces $T$ and only $T$, which might have independent interest.

Comments:	16 pages, 1 figure. Presented at the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1705.08640 [cs.DS]
	(or arXiv:1705.08640v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1705.08640

Submission history

From: Djamal Belazzougui [view email]
[v1] Wed, 24 May 2017 07:42:36 UTC (139 KB)

Computer Science > Data Structures and Algorithms

Title:Representing the suffix tree with the CDAWG

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Representing the suffix tree with the CDAWG

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators