-
arXiv:1006.4136 [pdf, ps, other]
Competitive Boolean Function Evaluation: Beyond Monotonicity, and the Symmetric Case
Abstract: We study the extremal competitive ratio of Boolean function evaluation. We provide the first non-trivial lower and upper bounds for classes of Boolean functions which are not included in the class of monotone Boolean functions. For the particular case of symmetric functions our bounds are matching and we exactly characterize the best possible competitiveness achievable by a deterministic algorithm… ▽ More
Submitted 21 June, 2010; originally announced June 2010.
Comments: 15 pages, 1 figure, to appear in Discrete Applied Mathematics
Journal ref: Discrete Applied Mathematics 159 (2011) 1070--1078
-
arXiv:0912.5079 [pdf, ps, other]
A Lower Bound on the Complexity of Approximating the Entropy of a Markov Source
Abstract: Suppose that, for any (k \geq 1), (ε> 0) and sufficiently large $σ$, we are given a black box that allows us to sample characters from a $k$th-order Markov source over the alphabet (\{0, ..., σ- 1\}). Even if we know the source has entropy either 0 or at least (\log (σ- k)), there is still no algorithm that, with probability bounded away from (1 / 2), guesses the entropy correctly after sampling… ▽ More
Submitted 27 December, 2009; originally announced December 2009.
-
arXiv:0912.0850 [pdf, ps, other]
Grammar-Based Compression in a Streaming Model
Abstract: We show that, given a string $s$ of length $n$, with constant memory and logarithmic passes over a constant number of streams we can build a context-free grammar that generates $s$ and only $s$ and whose size is within an $\Oh{\min (g \log g, \sqrt{n \log g})}$-factor of the minimum $g$. This stands in contrast to our previous result that, with polylogarithmic memory and polylogarithmic passes o… ▽ More
Submitted 5 February, 2010; v1 submitted 4 December, 2009; originally announced December 2009.
Comments: Section on recent work added, sketching how to improve bounds and support random access
-
Efficient Fully-Compressed Sequence Representations
Abstract: We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..σ]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh{\lg\lgσ}$ and average time $\Oh{\lg \Ho(s)}$. The worst-cas… ▽ More
Submitted 1 April, 2012; v1 submitted 25 November, 2009; originally announced November 2009.
-
arXiv:0909.4341 [pdf, ps, other]
Lightweight Data Indexing and Compression in External Memory
Abstract: In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size $n$, they use only ${n}$ bits of disk working space while all previous approaches use $\Th{n \log n}$ bits of disk working space. Moreover, our algorithms access disk data… ▽ More
Submitted 24 September, 2009; originally announced September 2009.
-
Tight Bounds for Online Stable Sorting
Abstract: Although many authors have considered how many ternary comparisons it takes to sort a multiset $S$ of size $n$, the best known upper and lower bounds still differ by a term linear in $n$. In this paper we restrict our attention to online stable sorting and prove upper and lower bounds that are within (o (n)) not only of each other but also of the best known upper bound for offline sorting. Speci… ▽ More
Submitted 4 July, 2009; originally announced July 2009.
-
Fast and Compact Prefix Codes
Abstract: It is well-known that, given a probability distribution over $n$ characters, in the worst case it takes (Θ(n \log n)) bits to store a prefix code with minimum expected codeword length. However, in this paper we first show that, for any $0<ε<1/2$ with (1 / ε= \Oh{\polylog{n}}), it takes $\Oh{n \log \log (1 / ε)}$ bits to store a prefix code with expected codeword length within $ε$ of the minimum.… ▽ More
Submitted 19 May, 2009; originally announced May 2009.
-
Range Quantile Queries: Another Virtue of Wavelet Trees
Abstract: We show how to use a balanced wavelet tree as a data structure that stores a list of numbers and supports efficient {\em range quantile queries}. A range quantile query takes a rank and the endpoints of a sublist and returns the number with that rank in that sublist. For example, if the rank is half the sublist's length, then the query returns the sublist's median. We also show how these queries… ▽ More
Submitted 7 April, 2010; v1 submitted 26 March, 2009; originally announced March 2009.
Comments: Added note about generalization to any constant number of dimensions.
-
New Algorithms and Lower Bounds for Sequential-Access Data Compression
Abstract: This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case… ▽ More
Submitted 1 February, 2009; originally announced February 2009.
Comments: draft of PhD thesis
-
arXiv:0812.3306 [pdf, ps, other]
Worst-Case Optimal Adaptive Prefix Coding
Abstract: A common complaint about adaptive prefix coding is that it is much slower than static prefix coding. Karpinski and Nekrich recently took an important step towards resolving this: they gave an adaptive Shannon coding algorithm that encodes each character in (O (1)) amortized time and decodes it in (O (\log H)) amortized time, where $H$ is the empirical entropy of the input string $s$. For compari… ▽ More
Submitted 17 December, 2008; originally announced December 2008.
-
arXiv:0812.2868 [pdf, ps, other]
Minimax Trees in Linear Time
Abstract: A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves' depths, it minimizes the maximum of any leaf's weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like, $\Oh{n \log n}$-time algorithm for building them. Drmota and Szpankowski (2002) gave another $\Oh{n \log n}$-time algorithm, which checks the Kraft… ▽ More
Submitted 28 January, 2009; v1 submitted 15 December, 2008; originally announced December 2008.
-
arXiv:0811.3602 [pdf, ps, other]
Low-Memory Adaptive Prefix Coding
Abstract: In this paper we study the adaptive prefix coding problem in cases where the size of the input alphabet is large. We present an online prefix coding algorithm that uses $O(σ^{1 / λ+ ε}) $ bits of space for any constants $\eps>0$, $λ>1$, and encodes the string of symbols in $O(\log \log σ)$ time per symbol \emph{in the worst case}, where $σ$ is the size of the alphabet. The upper bound on the enc… ▽ More
Submitted 21 November, 2008; originally announced November 2008.
Comments: 10 pages
-
arXiv:0810.5064 [pdf, ps, other]
A New Algorithm for Building Alphabetic Minimax Trees
Abstract: We show how to build an alphabetic minimax tree for a sequence (W = w_1, >..., w_n) of real weights in (O (n d \log \log n)) time, where $d$ is the number of distinct integers (\lceil w_i \rceil). We apply this algorithm to building an alphabetic prefix code given a sample.
Submitted 28 October, 2008; originally announced October 2008.
Comments: in preparation
-
arXiv:0711.3338 [pdf, ps, other]
Bounds for Compression in Streaming Models
Abstract: Compression algorithms and streaming algorithms are both powerful tools for dealing with massive data sets, but many of the best compression algorithms -- e.g., those based on the Burrows-Wheeler Transform -- at first seem incompatible with streaming. In this paper we consider several popular streaming models and ask in which, if any, we can compress as well as we can with the BWT. We first prov… ▽ More
Submitted 19 April, 2008; v1 submitted 21 November, 2007; originally announced November 2007.
Comments: added reduction from sorting to the Burrows-Wheeler Transform; thus, Grohe and Schweikardt's lower bound for short-sorting implies the same lower bound for the BWT
-
arXiv:0708.2084 [pdf, ps, other]
Empirical entropy in context
Abstract: We trace the history of empirical entropy, touching briefly on its relation to Markov processes, normal numbers, Shannon entropy, the Chomsky hierarchy, Kolmogorov complexity, Ziv-Lempel compression, de Bruijn sequences and stochastic complexity.
Submitted 15 August, 2007; originally announced August 2007.
Comments: A survey of some results related to empirical entropy, written in the spring of 2007 as part of an introduction to a PhD thesis
-
arXiv:0708.1877 [pdf, ps, other]
A nearly tight memory-redundancy trade-off for one-pass compression
Abstract: Let $s$ be a string of length $n$ over an alphabet of constant size $σ$ and let $c$ and $ε$ be constants with (1 \geq c \geq 0) and (ε> 0). Using (O (n)) time, (O (n^c)) bits of memory and one pass we can always encode $s$ in (n H_k (s) + O (σ^k n^{1 - c + ε})) bits for all integers (k \geq 0) simultaneously. On the other hand, even with unlimited time, using (O (n^c)) bits of memory and one pas… ▽ More
Submitted 14 August, 2007; originally announced August 2007.
-
arXiv:cs/0611099 [pdf, ps, other]
On the space complexity of one-pass compression
Abstract: We study how much memory one-pass compression algorithms need to compete with the best multi-pass algorithms. We call a one-pass algorithm an (f (n, \ell))-footprint compressor if, given $n$, $\ell$ and an $n$-ary string $S$, it stores $S$ in ((\rule{0ex}{2ex} O (H_\ell (S)) + o (\log n)) |S| + O (n^{\ell + 1} \log n)) bits -- where (H_\ell (S)) is the $\ell$th-order empirical entropy of $S$ --… ▽ More
Submitted 20 November, 2006; originally announced November 2006.
ACM Class: H.1.1
-
arXiv:cs/0506056 [pdf, ps, other]
Large Alphabets and Incompressibility
Abstract: We briefly survey some concepts related to empirical entropy -- normal numbers, de Bruijn sequences and Markov processes -- and investigate how well it approximates Kolmogorov complexity. Our results suggest $\ell$th-order empirical entropy stops being a reasonable complexity metric for almost all strings of length $m$ over alphabets of size $n$ about when $n^\ell$ surpasses $m$.
Submitted 9 March, 2006; v1 submitted 13 June, 2005; originally announced June 2005.
ACM Class: E.4
-
arXiv:cs/0506027 [pdf, ps, other]
Sorting a Low-Entropy Sequence
Abstract: We give the first sorting algorithm with bounds in terms of higher-order entropies: let $S$ be a sequence of length $m$ containing $n$ distinct elements and let (H_\ell (S)) be the $\ell$th-order empirical entropy of $S$, with (n^{\ell + 1} \log n \in O (m)); our algorithm sorts $S$ using ((H_\ell (S) + O (1)) m) comparisons.
Submitted 8 June, 2005; originally announced June 2005.
ACM Class: E.4; E.5
-
arXiv:cs/0506025 [pdf, ps, other]
Dynamic Asymmetric Communication
Abstract: We show how any dynamic instantaneous compression algorithm can be converted to an asymmetric communication protocol, with which a server with high bandwidth can help clients with low bandwidth send it messages. Unlike previous authors, we do not assume the server knows the messages' distribution, and our protocols are the first to use only one round of communication for each message.
Submitted 20 November, 2006; v1 submitted 8 June, 2005; originally announced June 2005.
Comments: Previous versions appeared at DCC 06 and SIROCCO 06; current version is preliminary journal version
ACM Class: E.4
-
arXiv:cs/0506016 [pdf, ps, other]
Compressing Probability Distributions
Abstract: We show how to store good approximations of probability distributions in small space.
Submitted 6 June, 2005; originally announced June 2005.
ACM Class: E.4
Journal ref: 10.1016/j.ipl.2005.10.006
-
arXiv:cs/0503085 [pdf, ps, other]
Dynamic Shannon Coding
Abstract: We present a new algorithm for dynamic prefix-free coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-restricted coding, alphabetic coding and coding with unequal letter costs.
Submitted 30 March, 2005; originally announced March 2005.
Comments: 6 pages; conference version presented at ESA 2004; journal version submitted to IEEE Transactions on Information Theory
ACM Class: E.4