-
Accelerating Relative Entropy Coding with Space Partitioning
Authors:
Jiajun He,
Gergely Flamich,
José Miguel Hernández-Lobato
Abstract:
Relative entropy coding (REC) algorithms encode a random sample following a target distribution $Q$, using a coding distribution $P$ shared between the sender and receiver. Sadly, general REC algorithms suffer from prohibitive encoding times, at least on the order of $2^{D_{\text{KL}}[Q||P]}$, and faster algorithms are limited to very specific settings. This work addresses this issue by introducin…
▽ More
Relative entropy coding (REC) algorithms encode a random sample following a target distribution $Q$, using a coding distribution $P$ shared between the sender and receiver. Sadly, general REC algorithms suffer from prohibitive encoding times, at least on the order of $2^{D_{\text{KL}}[Q||P]}$, and faster algorithms are limited to very specific settings. This work addresses this issue by introducing a REC scheme utilizing space partitioning to reduce runtime in practical scenarios. We provide theoretical analyses of our method and demonstrate its effectiveness with both toy examples and practical applications. Notably, our method successfully handles REC tasks with $D_{\text{KL}}[Q||P]$ about three times greater than what previous methods can manage, and reduces the bitrate by approximately 5-15% in VAE-based lossless compression on MNIST and INR-based lossy compression on CIFAR-10, compared to previous methods, significantly improving the practicality of REC for neural compression.
△ Less
Submitted 24 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Some Notes on the Sample Complexity of Approximate Channel Simulation
Authors:
Gergely Flamich,
Lennie Wells
Abstract:
Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime…
▽ More
Channel simulation algorithms can efficiently encode random samples from a prescribed target distribution $Q$ and find applications in machine learning-based lossy data compression. However, algorithms that encode exact samples usually have random runtime, limiting their applicability when a consistent encoding time is desirable. Thus, this paper considers approximate schemes with a fixed runtime instead. First, we strengthen a result of Agustsson and Theis and show that there is a class of pairs of target distribution $Q$ and coding distribution $P$, for which the runtime of any approximate scheme scales at least super-polynomially in $D_\infty[Q \Vert P]$. We then show, by contrast, that if we have access to an unnormalised Radon-Nikodym derivative $r \propto dQ/dP$ and knowledge of $D_{KL}[Q \Vert P]$, we can exploit global-bound, depth-limited A* coding to ensure $\mathrm{TV}[Q \Vert P] \leq ε$ and maintain optimal coding performance with a sample complexity of only $\exp_2\big((D_{KL}[Q \Vert P] + o(1)) \big/ ε\big)$.
△ Less
Submitted 14 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
On Channel Simulation with Causal Rejection Samplers
Authors:
Daniel Goc,
Gergely Flamich
Abstract:
One-shot channel simulation has recently emerged as a promising alternative to quantization and entropy coding in machine-learning-based lossy data compression schemes. However, while there are several potential applications of channel simulation - lossy compression with realism constraints or differential privacy, to name a few - little is known about its fundamental limitations. In this paper, w…
▽ More
One-shot channel simulation has recently emerged as a promising alternative to quantization and entropy coding in machine-learning-based lossy data compression schemes. However, while there are several potential applications of channel simulation - lossy compression with realism constraints or differential privacy, to name a few - little is known about its fundamental limitations. In this paper, we restrict our attention to a subclass of channel simulation protocols called causal rejection samplers (CRS), establish new, tighter lower bounds on their expected runtime and codelength, and demonstrate the bounds' achievability. Concretely, for an arbitrary CRS, let $Q$ and $P$ denote a target and proposal distribution supplied as input, and let $K$ be the number of samples examined by the algorithm. We show that the expected runtime $\mathbb{E}[K]$ of any CRS scales at least as $\exp_2(D_\infty[Q || P])$, where $D_\infty[Q || P]$ is the Rényi $\infty$-divergence. Regarding the codelength, we show that $D_{KL}[Q || P] \leq D_{CS}[Q || P] \leq \mathbb{H}[K]$, where $D_{CS}[Q || P]$ is a new quantity we call the channel simulation divergence. Furthermore, we prove that our new lower bound, unlike the $D_{KL}[Q || P]$ lower bound, is achievable tightly, i.e. there is a CRS such that $\mathbb{H}[K] \leq D_{CS}[Q || P] + \log_2 (e + 1)$. Finally, we conduct numerical studies of the asymptotic scaling of the codelength of Gaussian and Laplace channel simulation algorithms.
△ Less
Submitted 3 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo
Authors:
Szilvia Ujváry,
Gergely Flamich,
Vincent Fortuin,
José Miguel Hernández Lobato
Abstract:
An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal G…
▽ More
An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal Gibbs posterior using Hamiltonian Monte Carlo, (2) estimate its KL divergence from the prior with thermodynamic integration, and (3) propose three methods to obtain high-probability bounds under different assumptions. Our experiments on the MNIST dataset reveal significant tightness gaps, as much as 5-6\% in some cases.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations
Authors:
Jiajun He,
Gergely Flamich,
Zongyu Guo,
José Miguel Hernández-Lobato
Abstract:
COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approxi…
▽ More
COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while retaining a low computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates. Our PyTorch implementation is available at https://github.com/cambridge-mlg/RECOMBINER/.
△ Less
Submitted 7 March, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Faster Relative Entropy Coding with Greedy Rejection Coding
Authors:
Gergely Flamich,
Stratis Markou,
Jose Miguel Hernandez Lobato
Abstract:
Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$ using as few bits as possible. Unlike entropy coding, REC does not assume discrete distributions or require quantisation. As such, it can be naturally integrated into communication pipelines such as learnt compression and differentially private federated learning. Unfortunately…
▽ More
Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$ using as few bits as possible. Unlike entropy coding, REC does not assume discrete distributions or require quantisation. As such, it can be naturally integrated into communication pipelines such as learnt compression and differentially private federated learning. Unfortunately, despite their practical benefits, REC algorithms have not seen widespread application, due to their prohibitively slow runtimes or restrictive assumptions. In this paper, we make progress towards addressing these issues. We introduce Greedy Rejection Coding (GRC), which generalises the rejection based-algorithm of Harsha et al. (2007) to arbitrary probability spaces and partitioning schemes. We first show that GRC terminates almost surely and returns unbiased samples from $Q$, after which we focus on two of its variants: GRCS and GRCD. We show that for continuous $Q$ and $P$ over $\mathbb{R}$ with unimodal density ratio $dQ/dP$, the expected runtime of GRCS is upper bounded by $βD_{KL}[Q || P] + O(1)$ where $β\approx 4.82$, and its expected codelength is optimal. This makes GRCS the first REC algorithm with guaranteed optimal runtime for this class of distributions, up to the multiplicative constant $β$. This significantly improves upon the previous state-of-the-art method, A* coding (Flamich et al., 2022). Under the same assumptions, we experimentally observe and conjecture that the expected runtime and codelength of GRCD are upper bounded by $D_{KL}[Q || P] + O(1)$. Finally, we evaluate GRC in a variational autoencoder-based compression pipeline on MNIST, and show that a modified ELBO and an index-compression method can further improve compression efficiency.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Minimal Random Code Learning with Mean-KL Parameterization
Authors:
Jihao Andreas Lin,
Gergely Flamich,
José Miguel Hernández-Lobato
Abstract:
This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding di…
▽ More
This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.
△ Less
Submitted 4 December, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Compression with Bayesian Implicit Neural Representations
Authors:
Zongyu Guo,
Gergely Flamich,
Jiajun He,
Zhibo Chen,
José Miguel Hernández-Lobato
Abstract:
Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit…
▽ More
Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $β$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $β$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.
△ Less
Submitted 29 October, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Greedy Poisson Rejection Sampling
Authors:
Gergely Flamich
Abstract:
One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution $Q$ using a coding distribution $P$ using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sad…
▽ More
One-shot channel simulation is a fundamental data compression problem concerned with encoding a single sample from a target distribution $Q$ using a coding distribution $P$ using as few bits as possible on average. Algorithms that solve this problem find applications in neural data compression and differential privacy and can serve as a more efficient alternative to quantization-based methods. Sadly, existing solutions are too slow or have limited applicability, preventing widespread adoption. In this paper, we conclusively solve one-shot channel simulation for one-dimensional problems where the target-proposal density ratio is unimodal by describing an algorithm with optimal runtime. We achieve this by constructing a rejection sampling procedure equivalent to greedily searching over the points of a Poisson process. Hence, we call our algorithm greedy Poisson rejection sampling (GPRS) and analyze the correctness and time complexity of several of its variants. Finally, we empirically verify our theorems, demonstrating that GPRS significantly outperforms the current state-of-the-art method, A* coding. Our code is available at https://github.com/gergely-flamich/greedy-poisson-rejection-sampling.
△ Less
Submitted 29 March, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Adaptive Greedy Rejection Sampling
Authors:
Gergely Flamich,
Lucas Theis
Abstract:
We consider channel simulation protocols between two communicating parties, Alice and Bob. First, Alice receives a target distribution $Q$, unknown to Bob. Then, she employs a shared coding distribution $P$ to send the minimum amount of information to Bob so that he can simulate a single sample $X \sim Q$. For discrete distributions, Harsha et al. (2009) developed a well-known channel simulation p…
▽ More
We consider channel simulation protocols between two communicating parties, Alice and Bob. First, Alice receives a target distribution $Q$, unknown to Bob. Then, she employs a shared coding distribution $P$ to send the minimum amount of information to Bob so that he can simulate a single sample $X \sim Q$. For discrete distributions, Harsha et al. (2009) developed a well-known channel simulation protocol -- greedy rejection sampling (GRS) -- with a bound of ${D_{KL}[Q \,\Vert\, P] + 2\ln(D_{KL}[Q \,\Vert\, P] + 1) + \mathcal{O}(1)}$ on the expected codelength of the protocol. In this paper, we extend the definition of GRS to general probability spaces and allow it to adapt its proposal distribution after each step. We call this new procedure Adaptive GRS (AGRS) and prove its correctness. Furthermore, we prove the surprising result that the expected runtime of GRS is exactly $\exp(D_\infty[Q \,\Vert\, P])$, where $D_\infty[Q \,\Vert\, P]$ denotes the Rényi $\infty$-divergence. We then apply AGRS to Gaussian channel simulation problems. We show that the expected runtime of GRS is infinite when averaged over target distributions and propose a solution that trades off a slight increase in the coding cost for a finite runtime. Finally, we describe a specific instance of AGRS for 1D Gaussian channels inspired by hybrid coding. We conjecture and demonstrate empirically that the runtime of AGRS is $\mathcal{O}(D_{KL}[Q \,\Vert\, P])$ in this case.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Fast Relative Entropy Coding with A* coding
Authors:
Gergely Flamich,
Stratis Markou,
José Miguel Hernández-Lobato
Abstract:
Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$, such that the expected codelength is $\mathcal{O}(D_{KL}[Q \,||\, P])$. REC can be seamlessly integrated with existing learned compression models since, unlike entropy coding, it does not assume discrete $Q$ or $P$, and does not require quantisation. However, general REC algor…
▽ More
Relative entropy coding (REC) algorithms encode a sample from a target distribution $Q$ using a proposal distribution $P$, such that the expected codelength is $\mathcal{O}(D_{KL}[Q \,||\, P])$. REC can be seamlessly integrated with existing learned compression models since, unlike entropy coding, it does not assume discrete $Q$ or $P$, and does not require quantisation. However, general REC algorithms require an intractable $Ω(e^{D_{KL}[Q \,||\, P]})$ runtime. We introduce AS* and AD* coding, two REC algorithms based on A* sampling. We prove that, for continuous distributions over $\mathbb{R}$, if the density ratio is unimodal, AS* has $\mathcal{O}(D_{\infty}[Q \,||\, P])$ expected runtime, where $D_{\infty}[Q \,||\, P]$ is the Rényi $\infty$-divergence. We provide experimental evidence that AD* also has $\mathcal{O}(D_{\infty}[Q \,||\, P])$ expected runtime. We prove that AS* and AD* achieve an expected codelength of $\mathcal{O}(D_{KL}[Q \,||\, P])$. Further, we introduce DAD*, an approximate algorithm based on AD* which retains its favourable runtime and has bias similar to that of alternative methods. Focusing on VAEs, we propose the IsoKL VAE (IKVAE), which can be used with DAD* to further improve compression efficiency. We evaluate A* coding with (IK)VAEs on MNIST, showing that it can losslessly compress images near the theoretically optimal limit.
△ Less
Submitted 19 June, 2022; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding
Authors:
Gergely Flamich,
Marton Havasi,
José Miguel Hernández-Lobato
Abstract:
Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the pri…
▽ More
Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the prior. However, due to the underlying algorithm, these methods can only be used for lossless compression, and they only achieve their nominal efficiency when compressing multiple images simultaneously; they are inefficient for compressing single images. As an alternative, we propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images, supported by our empirical results obtained on the Cifar10, ImageNet32 and Kodak datasets. Moreover, unlike previous bits-back methods, REC is immediately applicable to lossy compression, where it is competitive with the state-of-the-art on the Kodak dataset.
△ Less
Submitted 19 April, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.