Skip to main content

Showing 1–21 of 21 results for author: Toderici, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.18231  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    High-Fidelity Image Compression with Score-based Generative Models

    Authors: Emiel Hoogeboom, Eirikur Agustsson, Fabian Mentzer, Luca Versari, George Toderici, Lucas Theis

    Abstract: Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simpl… ▽ More

    Submitted 7 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

  2. arXiv:2212.13824  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Realism Image Compression with a Conditional Generator

    Authors: Eirikur Agustsson, David Minnen, George Toderici, Fabian Mentzer

    Abstract: By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a mi… ▽ More

    Submitted 30 March, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: CVPR'23 Camera Ready

  3. arXiv:2206.07307  [pdf, other

    cs.CV cs.LG eess.IV

    VCT: A Video Compression Transformer

    Authors: Fabian Mentzer, George Toderici, David Minnen, Sung-** Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

    Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and war** operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distri… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS'22 Camera Ready Version. Code: https://goo.gle/vct-paper

  4. arXiv:2111.08988  [pdf, other

    cs.GR cs.LG eess.IV eess.SP

    LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks

    Authors: Berivan Isik, Philip A. Chou, Sung ** Hwang, Nick Johnston, George Toderici

    Abstract: We consider the attributes of a point cloud as samples of a vector-valued volumetric function at discrete positions. To compress the attributes given the positions, we compress the parameters of the volumetric function. We model the volumetric function by tiling space into blocks, and representing the function over each block by shifts of a coordinate-based, or implicit, neural network. Inputs to… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: 30 pages, 29 figures

  5. arXiv:2107.12038  [pdf, other

    eess.IV cs.CV

    Neural Video Compression using GANs for Detail Synthesis and Propagation

    Authors: Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

    Abstract: We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective:… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: First two authors contributed equally. ECCV Camera ready version

  6. arXiv:2007.11797  [pdf, other

    cs.CV eess.IV

    End-to-end Learning of Compressible Features

    Authors: Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, George Toderici

    Abstract: Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as t… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

    Comments: Accepted at ICIP 2020

  7. arXiv:2007.03034  [pdf, other

    cs.IT eess.IV

    Nonlinear Transform Coding

    Authors: Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung ** Hwang, George Toderici

    Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate--distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate--distortion performance of NTC with the… ▽ More

    Submitted 23 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: 17 pages, 14 figures. Accepted for publication in IEEE Journal of Selected Topics in Signal Processing

  8. arXiv:2006.09965  [pdf, other

    eess.IV cs.CV cs.LG

    High-Fidelity Generative Image Compression

    Authors: Fabian Mentzer, George Toderici, Michael Tschannen, Eirikur Agustsson

    Abstract: We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system. In particular, we investigate normalization layers, generator and discriminator architectures, training strategies, as well as perceptual losses. In contrast to previous work, i) we obtain visually pleasing reconstructions that are perceptual… ▽ More

    Submitted 23 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: This is the Camera Ready version for NeurIPS 2020. Project page: https://hific.github.io

  9. arXiv:1809.02736  [pdf, other

    cs.CV

    Joint Autoregressive and Hierarchical Priors for Learned Image Compression

    Authors: David Minnen, Johannes Ballé, George Toderici

    Abstract: Recent models for learned image compression are based on autoencoders, learning approximately invertible map**s from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

    Comments: Accepted at the 32nd Conference on Neural Information Processing Systems (NIPS 2018)

  10. arXiv:1808.00447  [pdf, other

    cs.CV

    Towards a Semantic Perceptual Image Metric

    Authors: Troy Chinen, Johannes Ballé, Chunhui Gu, Sung ** Hwang, Sergey Ioffe, Nick Johnston, Thomas Leung, David Minnen, Sean O'Malley, Charles Rosenberg, George Toderici

    Abstract: We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments me… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  11. arXiv:1805.12295  [pdf, other

    cs.CV

    Image-Dependent Local Entropy Models for Learned Image Compression

    Authors: David Minnen, George Toderici, Saurabh Singh, Sung ** Hwang, Michele Covell

    Abstract: The leading approach for image compression with artificial neural networks (ANNs) is to learn a nonlinear transform and a fixed entropy model that are optimized for rate-distortion performance. We show that this approach can be significantly improved by incorporating spatially local, image-dependent entropy models. The key insight is that existing ANN-based methods learn an entropy model that is s… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Journal ref: International Conference on Image Processing 2018

  12. arXiv:1802.02629  [pdf, other

    cs.CV

    Spatially adaptive image compression using a tiled deep network

    Authors: David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung ** Hwang, Damien Vincent, Saurabh Singh

    Abstract: Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images. Existing image compression algorithms based on neural networks learn quantized representations with a constant spatial bit rate across each image. While entropy coding introduces some spatial variation, traditional codecs have benefited significantly by explicitly adapting t… ▽ More

    Submitted 7 February, 2018; originally announced February 2018.

    Journal ref: International Conference on Image Processing 2017

  13. arXiv:1705.08421  [pdf, other

    cs.CV

    AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

    Authors: Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

    Abstract: This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual… ▽ More

    Submitted 30 April, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for details

  14. arXiv:1705.06687  [pdf, other

    cs.CV

    Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

    Authors: Michele Covell, Nick Johnston, David Minnen, Sung ** Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici

    Abstract: We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and p… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

  15. arXiv:1703.10114  [pdf, other

    cs.CV

    Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

    Authors: Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung ** Hwang, Joel Shor, George Toderici

    Abstract: We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several m… ▽ More

    Submitted 29 March, 2017; originally announced March 2017.

  16. arXiv:1609.08675  [pdf, other

    cs.CV

    YouTube-8M: A Large-Scale Video Classification Benchmark

    Authors: Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan

    Abstract: Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale. It is possible to train models over millions of examples within a few days. Although large-scale datasets exist for image understanding, such as ImageNet, there… ▽ More

    Submitted 27 September, 2016; originally announced September 2016.

    Comments: 10 pages

  17. arXiv:1608.05148  [pdf, other

    cs.CV

    Full Resolution Image Compression with Recurrent Neural Networks

    Authors: George Toderici, Damien Vincent, Nick Johnston, Sung ** Hwang, David Minnen, Joel Shor, Michele Covell

    Abstract: This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a ne… ▽ More

    Submitted 7 July, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

    Comments: Updated with content for CVPR and removed supplemental material to an external link for size limitations

  18. arXiv:1511.06085  [pdf, other

    cs.CV cs.LG cs.NE

    Variable Rate Image Compression with Recurrent Neural Networks

    Authors: George Toderici, Sean M. O'Malley, Sung ** Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, Rahul Sukthankar

    Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under review as a conference paper at ICLR 2016

  19. arXiv:1507.00302  [pdf, other

    cs.CV

    Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

    Authors: Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

    Abstract: We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation c… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

  20. arXiv:1505.06250  [pdf, other

    cs.CV cs.MM cs.NE

    Efficient Large Scale Video Classification

    Authors: Balakrishnan Varadarajan, George Toderici, Sudheendra Vijayanarasimhan, Apostol Natsev

    Abstract: Video classification has advanced tremendously over the recent years. A large part of the improvements in video classification had to do with the work done by the image classification community and the use of deep convolutional networks (CNNs) which produce competitive results with hand- crafted motion features. These networks were adapted to use video frames in various ways and have yielded state… ▽ More

    Submitted 22 May, 2015; originally announced May 2015.

  21. arXiv:1503.08909  [pdf, other

    cs.CV

    Beyond Short Snippets: Deep Networks for Video Classification

    Authors: Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici

    Abstract: Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handli… ▽ More

    Submitted 13 April, 2015; v1 submitted 31 March, 2015; originally announced March 2015.