Skip to main content

Showing 1–18 of 18 results for author: Minnen, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.14125  [pdf, other

    cs.CV cs.AI

    VideoPoet: A Large Language Model for Zero-Shot Video Generation

    Authors: Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam , et al. (6 additional authors not shown)

    Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: To appear at ICML 2024; Project page: http://sites.research.google/videopoet/

  2. Advancing The Rate-Distortion-Computation Frontier For Neural Image Compression

    Authors: David Minnen, Nick Johnston

    Abstract: The rate-distortion performance of neural image compression models has exceeded the state-of-the-art for non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult due to the lack of standard benchmarking platfo… ▽ More

    Submitted 26 September, 2023; originally announced November 2023.

    Comments: Published in 2023 IEEE International Conference on Image Processing (ICIP)

  3. arXiv:2310.05737  [pdf, other

    cs.CV cs.AI cs.MM

    Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

    Authors: Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

    Abstract: While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this paper, we introduce MAGVIT-v2, a video tokenizer… ▽ More

    Submitted 29 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  4. arXiv:2309.15505  [pdf, other

    cs.CV cs.LG

    Finite Scalar Quantization: VQ-VAE Made Simple

    Authors: Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen

    Abstract: We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the… ▽ More

    Submitted 12 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Code: https://github.com/google-research/google-research/tree/master/fsq

  5. arXiv:2212.13824  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Realism Image Compression with a Conditional Generator

    Authors: Eirikur Agustsson, David Minnen, George Toderici, Fabian Mentzer

    Abstract: By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a mi… ▽ More

    Submitted 30 March, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: CVPR'23 Camera Ready

  6. arXiv:2206.07307  [pdf, other

    cs.CV cs.LG eess.IV

    VCT: A Video Compression Transformer

    Authors: Fabian Mentzer, George Toderici, David Minnen, Sung-** Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

    Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and war** operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distri… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS'22 Camera Ready Version. Code: https://goo.gle/vct-paper

  7. arXiv:2107.12038  [pdf, other

    eess.IV cs.CV

    Neural Video Compression using GANs for Detail Synthesis and Propagation

    Authors: Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

    Abstract: We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective:… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: First two authors contributed equally. ECCV Camera ready version

  8. arXiv:2007.08739  [pdf, other

    eess.IV cs.CV cs.IT cs.LG

    Channel-wise Autoregressive Entropy Models for Learned Image Compression

    Authors: David Minnen, Saurabh Singh

    Abstract: In learning-based approaches to image compression, codecs are developed by optimizing a computational model to minimize a rate-distortion objective. Currently, the most effective learned image codecs take the form of an entropy-constrained autoencoder with an entropy model that uses both forward and backward adaptation. Forward adaptation makes use of side information and can be efficiently integr… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: Published at the IEEE International Conference on Image Processing (ICIP) 2020

  9. arXiv:2007.03034  [pdf, other

    cs.IT eess.IV

    Nonlinear Transform Coding

    Authors: Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung ** Hwang, George Toderici

    Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing

  10. arXiv:1809.02736  [pdf, other

    cs.CV

    Joint Autoregressive and Hierarchical Priors for Learned Image Compression

    Authors: David Minnen, Johannes Ballé, George Toderici

    Abstract: Recent models for learned image compression are based on autoencoders, learning approximately invertible map**s from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

    Comments: Accepted at the 32nd Conference on Neural Information Processing Systems (NIPS 2018)

  11. arXiv:1808.00447  [pdf, other

    cs.CV

    Towards a Semantic Perceptual Image Metric

    Authors: Troy Chinen, Johannes Ballé, Chunhui Gu, Sung ** Hwang, Sergey Ioffe, Nick Johnston, Thomas Leung, David Minnen, Sean O'Malley, Charles Rosenberg, George Toderici

    Abstract: We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments me… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  12. arXiv:1805.12295  [pdf, other

    cs.CV

    Image-Dependent Local Entropy Models for Learned Image Compression

    Authors: David Minnen, George Toderici, Saurabh Singh, Sung ** Hwang, Michele Covell

    Abstract: The leading approach for image compression with artificial neural networks (ANNs) is to learn a nonlinear transform and a fixed entropy model that are optimized for rate-distortion performance. We show that this approach can be significantly improved by incorporating spatially local, image-dependent entropy models. The key insight is that existing ANN-based methods learn an entropy model that is s… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Journal ref: International Conference on Image Processing 2018

  13. arXiv:1802.02629  [pdf, other

    cs.CV

    Spatially adaptive image compression using a tiled deep network

    Authors: David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung ** Hwang, Damien Vincent, Saurabh Singh

    Abstract: Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images. Existing image compression algorithms based on neural networks learn quantized representations with a constant spatial bit rate across each image. While entropy coding introduces some spatial variation, traditional codecs have benefited significantly by explicitly adapting t… ▽ More

    Submitted 7 February, 2018; originally announced February 2018.

    Journal ref: International Conference on Image Processing 2017

  14. arXiv:1802.01436  [pdf, other

    eess.IV cs.IT

    Variational image compression with a scale hyperprior

    Authors: Johannes Ballé, David Minnen, Saurabh Singh, Sung ** Hwang, Nick Johnston

    Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unl… ▽ More

    Submitted 1 May, 2018; v1 submitted 31 January, 2018; originally announced February 2018.

    Comments: accepted as a conference contribution to International Conference on Learning Representations 2018

  15. arXiv:1705.06687  [pdf, other

    cs.CV

    Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

    Authors: Michele Covell, Nick Johnston, David Minnen, Sung ** Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici

    Abstract: We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and p… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

  16. arXiv:1703.10114  [pdf, other

    cs.CV

    Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

    Authors: Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung ** Hwang, Joel Shor, George Toderici

    Abstract: We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several m… ▽ More

    Submitted 29 March, 2017; originally announced March 2017.

  17. arXiv:1608.05148  [pdf, other

    cs.CV

    Full Resolution Image Compression with Recurrent Neural Networks

    Authors: George Toderici, Damien Vincent, Nick Johnston, Sung ** Hwang, David Minnen, Joel Shor, Michele Covell

    Abstract: This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a ne… ▽ More

    Submitted 7 July, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

    Comments: Updated with content for CVPR and removed supplemental material to an external link for size limitations

  18. arXiv:1511.06085  [pdf, other

    cs.CV cs.LG cs.NE

    Variable Rate Image Compression with Recurrent Neural Networks

    Authors: George Toderici, Sean M. O'Malley, Sung ** Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, Rahul Sukthankar

    Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under review as a conference paper at ICLR 2016