Search | arXiv e-print repository

Computationally-Efficient Neural Image Compression with Shallow Decoders

Abstract: Neural image compression methods have seen increasingly strong performance in recent years. However, they suffer orders of magnitude higher computational complexity compared to traditional codecs, which hinders their real-world deployment. This paper takes a step forward towards closing this gap in decoding complexity by using a shallow or even linear decoding transform resembling that of JPEG. To… ▽ More Neural image compression methods have seen increasingly strong performance in recent years. However, they suffer orders of magnitude higher computational complexity compared to traditional codecs, which hinders their real-world deployment. This paper takes a step forward towards closing this gap in decoding complexity by using a shallow or even linear decoding transform resembling that of JPEG. To compensate for the resulting drop in compression performance, we exploit the often asymmetrical computation budget between encoding and decoding, by adopting more powerful encoder networks and iterative encoding. We theoretically formalize the intuition behind, and our experimental results establish a new frontier in the trade-off between rate-distortion and decoding complexity for neural image compression. Specifically, we achieve rate-distortion performance competitive with the established mean-scale hyperprior architecture of Minnen et al. (2018) at less than 50K decoding FLOPs/pixel, reducing the baseline's overall decoding complexity by 80%, or over 90% for the synthesis transform alone. Our code can be found at https://github.com/mandt-lab/shallow-ntc. △ Less

Submitted 10 November, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

Comments: Updated version of the ICCV 2023 paper. Previously titled "Asymmetrically-powered Neural Image Compression with Shallow Decoders" on arXiv

arXiv:2209.06950 [pdf, other]

Lossy Image Compression with Conditional Diffusion Models

Authors: Ruihan Yang, Stephan Mandt

Abstract: This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural networ… ▽ More This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality. Our code is available at: \url{https://github.com/buggyyang/CDC_compression} △ Less

Submitted 2 January, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

arXiv:2203.08875 [pdf, other]

SC2 Benchmark: Supervised Compression for Split Computing

Authors: Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt

Abstract: With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain… ▽ More With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing. △ Less

Submitted 14 June, 2023; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted at TMLR. Code and models are available at https://github.com/yoshitomo-matsubara/sc2-benchmark

arXiv:2202.06533 [pdf, other]

An Introduction to Neural Data Compression

Authors: Yibo Yang, Stephan Mandt, Lucas Theis

Abstract: Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models,… ▽ More Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far. △ Less

Submitted 16 August, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: Published in Foundations and Trends in Computer Graphics and Vision: Vol. 15, No. 2, pp 113-200. https://www.nowpublishers.com/article/Details/CGV-107

arXiv:2107.13136 [pdf, other]

Insights from Generative Modeling for Neural Video Compression

Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt

Abstract: While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a gener… ▽ More While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field. △ Less

Submitted 9 July, 2023; v1 submitted 27 July, 2021; originally announced July 2021.

Comments: This work has been submitted to the IEEE for publication as an extension work of arXiv:2010.10258. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2010.10258

arXiv:2010.10258 [pdf, other]

Hierarchical Autoregressive Modeling for Neural Video Compression

Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt

Abstract: Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet… ▽ More Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal autoregressive transform, and propose avenues for enhancement based on this insight. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods. △ Less

Submitted 19 December, 2023; v1 submitted 18 October, 2020; originally announced October 2020.

Comments: Published as a conference paper at ICLR 2021

arXiv:2006.04240 [pdf, other]

Improving Inference for Neural Image Compression

Authors: Yibo Yang, Robert Bamler, Stephan Mandt

Abstract: We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conven… ▽ More We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conventional approach: an amortization gap, a discretization gap, and a marginalization gap. We propose remedies for each of these three limitations based on ideas related to iterative inference, stochastic annealing for discrete optimization, and bits-back coding, resulting in the first application of bits-back coding to lossy compression. In our experiments, which include extensive baseline comparisons and ablation studies, we achieve new state-of-the-art performance on lossy image compression using an established VAE architecture, by changing only the inference method. △ Less

Submitted 8 January, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 9 pages + detailed supplement with additional results; various typos corrected. Camera-ready version paper at NeurIPS 2020

arXiv:2002.08158 [pdf, other]

Variational Bayesian Quantization

Authors: Yibo Yang, Robert Bamler, Stephan Mandt

Abstract: We propose a novel algorithm for quantizing continuous latent representations in trained models. Our approach applies to deep probabilistic models, such as variational autoencoders (VAEs), and enables both data and model compression. Unlike current end-to-end neural compression methods that cater the model to a fixed quantization scheme, our algorithm separates model design and training from quant… ▽ More We propose a novel algorithm for quantizing continuous latent representations in trained models. Our approach applies to deep probabilistic models, such as variational autoencoders (VAEs), and enables both data and model compression. Unlike current end-to-end neural compression methods that cater the model to a fixed quantization scheme, our algorithm separates model design and training from quantization. Consequently, our algorithm enables "plug-and-play" compression with variable rate-distortion trade-off, using a single trained model. Our algorithm can be seen as a novel extension of arithmetic coding to the continuous domain, and uses adaptive quantization accuracy based on estimates of posterior uncertainty. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single standard VAE. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method. △ Less

Submitted 7 September, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: 9 pages + detailed supplement with additional full resolution reconstructed images; ICML 2020 final camera-ready version, title changed to "Variational Bayesian Quantization" following reviewer feedback

arXiv:1810.02845 [pdf, other]

Deep Generative Video Compression

Authors: Jun Han, Salvator Lombardo, Christopher Schroers, Stephan Mandt

Abstract: The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video compression is still in its infancy. Here, we propose an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video. Our approach builds upon variational autoencoder (VAE) models for sequential data and combines them with… ▽ More The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video compression is still in its infancy. Here, we propose an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video. Our approach builds upon variational autoencoder (VAE) models for sequential data and combines them with recent work on neural image compression. The approach jointly learns to transform the original sequence into a lower-dimensional representation as well as to discretize and entropy code this representation according to predictions of the sequential VAE. Rate-distortion evaluations on small videos from public data sets with varying complexity and diversity show that our model yields competitive results when trained on generic video content. Extreme compression performance is achieved when training the model on specialized content. △ Less

Submitted 1 November, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

Comments: Accepted at NeurIPS 2019, 15 pages, 8 figures

Showing 1–9 of 9 results for author: Mandt, S