Skip to main content

Showing 1–27 of 27 results for author: Cohen, T S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.01978  [pdf, other

    eess.IV cs.CV cs.LG

    Region-of-Interest Based Neural Video Compression

    Authors: Yura Perugachi-Diaz, Guillaume Sautière, Davide Abati, Yang Yang, Amirhossein Habibian, Taco S Cohen

    Abstract: Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low… ▽ More

    Submitted 2 November, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Updated arxiv version to the camera-ready version after acceptance at British Machine Vision Conference (BMVC) 2022

  2. arXiv:2111.10302  [pdf, other

    eess.IV cs.CV cs.LG

    Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

    Authors: Ties van Rozendaal, Johann Brehmer, Yunfan Zhang, Reza Pourreza, Auke Wiggers, Taco S. Cohen

    Abstract: We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This in… ▽ More

    Submitted 23 June, 2023; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: Matches version published in TMLR

  3. arXiv:2104.11487  [pdf, other

    cs.CV cs.LG

    Skip-Convolutions for Efficient Video Processing

    Authors: Amirhossein Habibian, Davide Abati, Taco S. Cohen, Babak Ehteshami Bejnordi

    Abstract: We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the mode… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  4. arXiv:2104.00807  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    A Combined Deep Learning based End-to-End Video Coding Architecture for YUV Color Space

    Authors: Ankitesh K. Singh, Hilmi E. Egilmez, Reza Pourreza, Muhammed Coban, Marta Karczewicz, Taco S. Cohen

    Abstract: Most of the existing deep learning based end-to-end video coding (DLEC) architectures are designed specifically for RGB color format, yet the video coding standards, including H.264/AVC, H.265/HEVC and H.266/VVC developed over past few decades, have been designed primarily for YUV 4:2:0 format, where the chrominance (U and V) components are subsampled to achieve superior compression performances c… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 5 pages, submitted to as a conference paper. arXiv admin note: text overlap with arXiv:2103.01760

  5. arXiv:2104.00531  [pdf, other

    eess.IV cs.CV cs.LG

    Extending Neural P-frame Codecs for B-frame Coding

    Authors: Reza Pourreza, Taco S Cohen

    Abstract: While most neural video codecs address P-frame coding (predicting each frame from past ones), in this paper we address B-frame compression (predicting frames using both past and future reference frames). Our B-frame solution is based on the existing P-frame methods. As a result, B-frame coding capability can easily be added to an existing neural codec. The basic idea of our B-frame coding method i… ▽ More

    Submitted 5 August, 2021; v1 submitted 30 March, 2021; originally announced April 2021.

    Comments: ICCV 2021

  6. arXiv:2103.01760  [pdf, other

    eess.IV cs.AI cs.CV cs.LG cs.MM

    Transform Network Architectures for Deep Learning based End-to-End Image/Video Coding in Subsampled Color Spaces

    Authors: Hilmi E. Egilmez, Ankitesh K. Singh, Muhammed Coban, Marta Karczewicz, Yinhao Zhu, Yang Yang, Amir Said, Taco S. Cohen

    Abstract: Most of the existing deep learning based end-to-end image/video coding (DLEC) architectures are designed for non-subsampled RGB color format. However, in order to achieve a superior coding performance, many state-of-the-art block-based compression standards such as High Efficiency Video Coding (HEVC/H.265) and Versatile Video Coding (VVC/H.266) are designed primarily for YUV 4:2:0 format, where U… ▽ More

    Submitted 27 August, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: 10 pages, accepted in IEEE Open Journal of Signal Processing (Special issue on Applied Artificial Intelligence and Machine Learning for Video Coding and Streaming)

  7. arXiv:2102.02913  [pdf, other

    cs.LG cs.CV

    Progressive Neural Image Compression with Nested Quantization and Latent Ordering

    Authors: Yadong Lu, Yinhao Zhu, Yang Yang, Amir Said, Taco S Cohen

    Abstract: We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream. In contrast to existing learned variable bitrate solutions which produce separate bitstreams for each quality, it enables easier rate-control and requires less storage. Leveraging the latent scaling based variable bit… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  8. arXiv:2101.08687  [pdf, other

    cs.LG

    Overfitting for Fun and Profit: Instance-Adaptive Data Compression

    Authors: Ties van Rozendaal, Iris A. M. Huijben, Taco S. Cohen

    Abstract: Neural data compression has been shown to outperform classical methods in terms of $RD$ performance, with results still improving rapidly. At a high level, neural compression is based on an autoencoder that tries to reconstruct the input instance from a (quantized) latent representation, coupled with a prior that is used to losslessly compress these latents. Due to limitations on model capacity an… ▽ More

    Submitted 1 June, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at International Conference on Learning Representations 2021

  9. arXiv:2005.04064  [pdf, other

    cs.LG stat.ML

    Lossy Compression with Distortion Constrained Optimization

    Authors: Ties van Rozendaal, Guillaume Sautière, Taco S. Cohen

    Abstract: When training end-to-end learned models for lossy compression, one has to balance the rate and distortion losses. This is typically done by manually setting a tradeoff parameter $β$, an approach called $β$-VAE. Using this approach it is difficult to target a specific rate or distortion value, because the result can be very sensitive to $β$, and the appropriate value for $β$ depends on the model an… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Accepted as a CVPR 2020 workshop paper: Workshop and Challenge on Learned Image Compression (CLIC)

  10. arXiv:2004.09691  [pdf, other

    cs.LG eess.IV stat.ML

    A Data and Compute Efficient Design for Limited-Resources Deep Learning

    Authors: Mirgahney Mohamed, Gabriele Cesa, Taco S. Cohen, Max Welling

    Abstract: Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning soluti… ▽ More

    Submitted 8 July, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

    Comments: Accepted for poster presentation at the Practical Machine Learning for Develo** Countries (PML4DC) workshop, ICLR 2020

  11. arXiv:2004.04342  [pdf, other

    cs.LG cs.CV stat.ML

    Feedback Recurrent Autoencoder for Video Compression

    Authors: Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere, Taco S Cohen

    Abstract: Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well s… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

  12. arXiv:2001.11235  [pdf, other

    cs.LG stat.ML

    Learning Discrete Distributions by Dequantization

    Authors: Emiel Hoogeboom, Taco S. Cohen, Jakub M. Tomczak

    Abstract: Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Naïve optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequanti… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

  13. arXiv:1911.04018  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Feedback Recurrent AutoEncoder

    Authors: Yang Yang, Guillaume Sautière, J. Jon Ryu, Taco S Cohen

    Abstract: In this work, we propose a new recurrent autoencoder architecture, termed Feedback Recurrent AutoEncoder (FRAE), for online compression of sequential data with temporal dependency. The recurrent structure of FRAE is designed to efficiently extract the redundancy along the time dimension and allows a compact discrete representation of the data to be learned. We demonstrate its effectiveness in spee… ▽ More

    Submitted 17 February, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  14. arXiv:1908.05717  [pdf, other

    eess.IV cs.LG stat.ML

    Video Compression With Rate-Distortion Autoencoders

    Authors: Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen

    Abstract: In this paper we present a a deep generative model for lossy video compression. We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding. Both autoencoder and prior are trained jointly to minimize a rate-distortion loss, which is closely related to the ELBO used in variational autoencoders. Despite its simplicity, we find… ▽ More

    Submitted 13 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  15. arXiv:1906.02481  [pdf, ps, other

    cs.LG hep-th stat.ML

    Covariance in Physics and Convolutional Neural Networks

    Authors: Miranda C. N. Cheng, Vassilis Anagiannis, Maurice Weiler, Pim de Haan, Taco S. Cohen, Max Welling

    Abstract: In this proceeding we give an overview of the idea of covariance (or equivariance) featured in the recent development of convolutional neural networks (CNNs). We study the similarities and differences between the use of covariance in theoretical physics and in the CNN context. Additionally, we demonstrate that the simple assumption of covariance, together with the required properties of locality,… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  16. arXiv:1902.04615  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Gauge Equivariant Convolutional Networks and the Icosahedral CNN

    Authors: Taco S. Cohen, Maurice Weiler, Berkay Kicanaoglu, Max Welling

    Abstract: The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the d… ▽ More

    Submitted 13 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: Proceedings of the International Conference on Machine Learning (ICML), 2019

  17. arXiv:1807.04689  [pdf, other

    stat.ML cs.LG

    Explorations in Homeomorphic Variational Auto-Encoding

    Authors: Luca Falorsi, Pim de Haan, Tim R. Davidson, Nicola De Cao, Maurice Weiler, Patrick Forré, Taco S. Cohen

    Abstract: The manifold hypothesis states that many kinds of high-dimensional data are concentrated near a low-dimensional manifold. If the topology of this data manifold is non-trivial, a continuous encoder network cannot embed it in a one-to-one manner without creating holes of low density in the latent space. This is at odds with the Gaussian prior assumption typically made in Variational Auto-Encoders (V… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: 16 pages, 8 figures, ICML workshop on Theoretical Foundations and Applications of Deep Generative Models

  18. arXiv:1807.00583  [pdf, other

    cs.CV cs.LG stat.ML

    Sample Efficient Semantic Segmentation using Rotation Equivariant Convolutional Networks

    Authors: Jasper Linmans, Jim Winkens, Bastiaan S. Veeling, Taco S. Cohen, Max Welling

    Abstract: We propose a semantic segmentation model that exploits rotation and reflection symmetries. We demonstrate significant gains in sample efficiency due to increased weight sharing, as well as improvements in robustness to symmetry transformations. The group equivariant CNN framework is extended for segmentation by introducing a new equivariant (G->Z2)-convolution that transforms feature maps on a gro… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: Presented at the ICML workshop: Towards learning with limited labels: Equivariance, Invariance, and Beyond, 2018

  19. arXiv:1804.04656  [pdf, other

    cs.LG stat.ML

    3D G-CNNs for Pulmonary Nodule Detection

    Authors: Marysia Winkels, Taco S. Cohen

    Abstract: Convolutional Neural Networks (CNNs) require a large amount of annotated data to learn from, which is often difficult to obtain in the medical domain. In this paper we show that the sample complexity of CNNs can be significantly improved by using 3D roto-translation group convolutions (G-Convs) instead of the more conventional translational convolutions. These 3D G-CNNs were applied to the problem… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Journal ref: International conference on Medical Imaging with Deep Learning, 2018

  20. arXiv:1803.10743  [pdf, other

    cs.LG cs.CV stat.ML

    Intertwiners between Induced Representations (with Applications to the Theory of Equivariant Neural Networks)

    Authors: Taco S. Cohen, Mario Geiger, Maurice Weiler

    Abstract: Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields ("feature channels"), whereas the steerable G-CNN can also use vect… ▽ More

    Submitted 30 March, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

  21. arXiv:1803.02108  [pdf, other

    cs.LG stat.ML

    HexaConv

    Authors: Emiel Hoogeboom, Jorn W. T. Peters, Taco S. Cohen, Max Welling

    Abstract: The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

  22. arXiv:1801.10130  [pdf, other

    cs.LG stat.ML

    Spherical CNNs

    Authors: Taco S. Cohen, Mario Geiger, Jonas Koehler, Max Welling

    Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive a… ▽ More

    Submitted 25 February, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

    Comments: Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018

    Journal ref: Proceedings of the International Conference on Learning Representations, 2018

  23. arXiv:1702.04595  [pdf, other

    cs.CV cs.AI

    Visualizing Deep Neural Network Decisions: Prediction Difference Analysis

    Authors: Luisa M Zintgraf, Taco S Cohen, Tameem Adel, Max Welling

    Abstract: This article presents the prediction difference analysis method for visualizing the response of a deep neural network to a specific input. When classifying images, the method highlights areas in a given input image that provide evidence for or against a certain class. It overcomes several shortcoming of previous methods and provides great additional insight into the decision making process of clas… ▽ More

    Submitted 15 February, 2017; originally announced February 2017.

    Comments: ICLR2017

  24. arXiv:1612.08498  [pdf, other

    cs.LG stat.ML

    Steerable CNNs

    Authors: Taco S. Cohen, Max Welling

    Abstract: It has long been recognized that the invariance and equivariance properties of a representation are critically important for success in many vision tasks. In this paper we present Steerable Convolutional Neural Networks, an efficient and flexible class of equivariant convolutional networks. We show that steerable CNNs achieve state of the art results on the CIFAR image classification benchmark. Th… ▽ More

    Submitted 26 December, 2016; originally announced December 2016.

    Journal ref: Proceedings of the International Conference on Learning Representations, 2017

  25. arXiv:1603.02518  [pdf, other

    cs.CV

    A New Method to Visualize Deep Neural Networks

    Authors: Luisa M. Zintgraf, Taco S. Cohen, Max Welling

    Abstract: We present a method for visualising the response of a deep neural network to a specific input. For image data for instance our method will highlight areas that provide evidence in favor of, and against choosing a certain class. The method overcomes several shortcomings of previous methods and provides great additional insight into the decision making process of convolutional networks, which is imp… ▽ More

    Submitted 12 June, 2017; v1 submitted 8 March, 2016; originally announced March 2016.

    Comments: Please note that this version of the article is outdated. The new version (published at ICLR2017) includes additional experiments on MRI scans and can be found at arXiv:1702.04595

  26. arXiv:1602.07576  [pdf, ps, other

    cs.LG stat.ML

    Group Equivariant Convolutional Networks

    Authors: Taco S. Cohen, Max Welling

    Abstract: We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without inc… ▽ More

    Submitted 3 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Journal ref: Proceedings of the International Conference on Machine Learning (ICML), 2016

  27. arXiv:1412.7659  [pdf, other

    cs.LG cs.CV cs.NE

    Transformation Properties of Learned Visual Representations

    Authors: Taco S. Cohen, Max Welling

    Abstract: When a three-dimensional object moves relative to an observer, a change occurs on the observer's image plane and in the visual representation computed by a learned model. Starting with the idea that a good visual representation is one that transforms linearly under scene motions, we show, using the theory of group representations, that any such representation is equivalent to a combination of the… ▽ More

    Submitted 7 April, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

    Comments: T.S. Cohen & M. Welling, Transformation Properties of Learned Visual Representations. In International Conference on Learning Representations (ICLR), 2015

    Journal ref: Proceedings of the International Conference on Learning Representations, 2015