How to Store a Random Walk

Viola, Emanuele; Weinstein, Omri; Yu, Huacheng

Abstract:Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files $\overline{X}:=(X_1,X_2,\ldots, X_n) \sim \mu$ which are \emph{correlated} ($H_\mu(\overline{X}) \ll \sum_i H_\mu(X_i)$), using as little (expected) memory as possible, such that each individual file $X_i$ can be recovered quickly with few (ideally constant) memory accesses.
In the case of independent random files, a dramatic result by \Pat (FOCS'08) and subsequently by Dodis, \Pat and Thorup (STOC'10) shows that it is possible to store $\overline{X}$ using just a \emph{constant} number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each $X_i$ in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least $\Omega(n/poly\lg n)$ extra bits for constant decoding time, even for "simple" joint distributions $\mu$.
We focus on the natural case of compressing\emph{Markov chains}, i.e., storing a length-$n$ random walk on any (possibly directed) graph $G$. Denoting by $\kappa(G,n)$ the number of length-$n$ walks on $G$, we show that there is a succinct data structure storing a random walk using $\lg_2 \kappa(G,n) + O(\lg n)$ bits of space, such that any vertex along the walk can be decoded in $O(1)$ time on a word-RAM. For the harder task of matching the \emph{point-wise} optimal space of the walk, i.e., the empirical entropy $\sum_{i=1}^{n-1} \lg (deg(v_i))$, we present a data structure with $O(1)$ extra bits at the price of $O(\lg n)$ decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the \emph{online} version of the problem with constant update and query time.

Subjects:	Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
Cite as:	arXiv:1907.10874 [cs.DS]
	(or arXiv:1907.10874v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1907.10874

Computer Science > Data Structures and Algorithms

Title:How to Store a Random Walk

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators