Computer Science > Data Structures and Algorithms
[Submitted on 25 Jul 2019]
Title:How to Store a Random Walk
View PDFAbstract:Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files $\overline{X}:=(X_1,X_2,\ldots, X_n) \sim \mu$ which are \emph{correlated} ($H_\mu(\overline{X}) \ll \sum_i H_\mu(X_i)$), using as little (expected) memory as possible, such that each individual file $X_i$ can be recovered quickly with few (ideally constant) memory accesses.
In the case of independent random files, a dramatic result by \Pat (FOCS'08) and subsequently by Dodis, \Pat and Thorup (STOC'10) shows that it is possible to store $\overline{X}$ using just a \emph{constant} number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each $X_i$ in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least $\Omega(n/poly\lg n)$ extra bits for constant decoding time, even for "simple" joint distributions $\mu$.
We focus on the natural case of compressing\emph{Markov chains}, i.e., storing a length-$n$ random walk on any (possibly directed) graph $G$. Denoting by $\kappa(G,n)$ the number of length-$n$ walks on $G$, we show that there is a succinct data structure storing a random walk using $\lg_2 \kappa(G,n) + O(\lg n)$ bits of space, such that any vertex along the walk can be decoded in $O(1)$ time on a word-RAM. For the harder task of matching the \emph{point-wise} optimal space of the walk, i.e., the empirical entropy $\sum_{i=1}^{n-1} \lg (deg(v_i))$, we present a data structure with $O(1)$ extra bits at the price of $O(\lg n)$ decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the \emph{online} version of the problem with constant update and query time.
Current browse context:
cs.DS
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.