-
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Authors:
Namitha Padmanabhan,
Matthew Gwilliam,
Pulkit Kumar,
Shishira R Maiya,
Max Ehrlich,
Abhinav Shrivastava
Abstract:
The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image superresolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canv…
▽ More
The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image superresolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs which we study learn to ''see'' the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Authors:
Shishira R Maiya,
Sharath Girish,
Max Ehrlich,
Hanyu Wang,
Kwot Sin Lee,
Patrick Poirson,
Pengxiang Wu,
Chen Wang,
Abhinav Shrivastava
Abstract:
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose…
▽ More
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
The First Principles of Deep Learning and Compression
Authors:
Max Ehrlich
Abstract:
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded app…
▽ More
The deep learning revolution incited by the 2012 Alexnet paper has been transformative for the field of computer vision. Many problems which were severely limited using classical solutions are now seeing unprecedented success. The rapid proliferation of deep learning methods has led to a sharp increase in their use in consumer and embedded applications. One consequence of consumer and embedded applications is lossy multimedia compression which is required to engineer the efficient storage and transmission of data in these real-world scenarios. As such, there has been increased interest in a deep learning solution for multimedia compression which would allow for higher compression ratios and increased visual quality.
The deep learning approach to multimedia compression, so called Learned Multimedia Compression, involves computing a compressed representation of an image or video using a deep network for the encoder and the decoder. While these techniques have enjoyed impressive academic success, their industry adoption has been essentially non-existent. Classical compression techniques like JPEG and MPEG are too entrenched in modern computing to be easily replaced. This dissertation takes an orthogonal approach and leverages deep learning to improve the compression fidelity of these classical algorithms. This allows the incredible advances in deep learning to be used for multimedia compression without threatening the ubiquity of the classical methods.
The key insight of this work is that methods which are motivated by first principles, i.e., the underlying engineering decisions that were made when the compression algorithms were developed, are more effective than general methods. By encoding prior knowledge into the design of the algorithm, the flexibility, performance, and/or accuracy are improved at the cost of generality...
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement
Authors:
Max Ehrlich,
Jon Barker,
Namitha Padmanabhan,
Larry Davis,
Andrew Tao,
Bryan Catanzaro,
Abhinav Shrivastava
Abstract:
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this…
▽ More
Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.
△ Less
Submitted 30 October, 2023; v1 submitted 31 January, 2022;
originally announced February 2022.
-
A Frequency Perspective of Adversarial Robustness
Authors:
Shishira R Maiya,
Max Ehrlich,
Vatsal Agarwal,
Ser-Nam Lim,
Tom Goldstein,
Abhinav Shrivastava
Abstract:
Adversarial examples pose a unique challenge for deep learning systems. Despite recent advances in both attacks and defenses, there is still a lack of clarity and consensus in the community about the true nature and underlying properties of adversarial examples. A deep understanding of these examples can provide new insights towards the development of more effective attacks and defenses. Driven by…
▽ More
Adversarial examples pose a unique challenge for deep learning systems. Despite recent advances in both attacks and defenses, there is still a lack of clarity and consensus in the community about the true nature and underlying properties of adversarial examples. A deep understanding of these examples can provide new insights towards the development of more effective attacks and defenses. Driven by the common misconception that adversarial examples are high-frequency noise, we present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings. Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent. Particularly, we highlight the glaring disparities between models trained on CIFAR-10 and ImageNet-derived datasets. Utilizing this framework, we analyze many intriguing properties of training robust models with frequency constraints, and propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
△ Less
Submitted 26 October, 2021;
originally announced November 2021.
-
ReLaX: Retinal Layer Attribution for Guided Explanations of Automated Optical Coherence Tomography Classification
Authors:
Evan Wen,
Rebecca Sorenson,
Max Ehrlich
Abstract:
30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning…
▽ More
30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning framework for explainable, accurate classification of retinal pathologies which achieves state-of-the-art accuracy. Furthermore, we emphasize producing both qualitative and quantitative explanations of the model's decisions. While previous works use pixel-level attribution methods for generating model explanations, our work uses a novel retinal layer attribution method for producing rich qualitative and quantitative model explanations. ReLaX determines the importance of each retinal layer by combining heatmaps with an OCT segmentation model. Our work is the first to produce detailed quantitative explanations of a model's predictions in this way. The combination of accuracy and interpretability can be clinically applied for accessible, high-quality patient care.
△ Less
Submitted 1 October, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Unsupervised Super-Resolution of Satellite Imagery for High Fidelity Material Label Transfer
Authors:
Arthita Ghosh,
Max Ehrlich,
Larry Davis,
Rama Chellappa
Abstract:
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source dom…
▽ More
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images. To this end, we propose an unsupervised domain adaptation based approach using adversarial learning. We aim to harvest information from smaller quantities of high resolution data (source domain) and utilize the same to super-resolve low resolution imagery (target domain). This can potentially aid in semantic as well as material label transfer from a richly annotated source to a target domain.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Analyzing and Mitigating JPEG Compression Defects in Deep Learning
Authors:
Max Ehrlich,
Larry Davis,
Ser-Nam Lim,
Abhinav Shrivastava
Abstract:
With the proliferation of deep learning methods, many computer vision problems which were considered academic are now viable in the consumer setting. One drawback of consumer applications is lossy compression, which is necessary from an engineering standpoint to efficiently and cheaply store and transmit user images. Despite this, there has been little study of the effect of compression on deep ne…
▽ More
With the proliferation of deep learning methods, many computer vision problems which were considered academic are now viable in the consumer setting. One drawback of consumer applications is lossy compression, which is necessary from an engineering standpoint to efficiently and cheaply store and transmit user images. Despite this, there has been little study of the effect of compression on deep neural networks and benchmark datasets are often losslessly compressed or compressed at high quality. Here we present a unified study of the effects of JPEG compression on a range of common tasks and datasets. We show that there is a significant penalty on common performance metrics for high compression. We test several methods for mitigating this penalty, including a novel method based on artifact correction which requires no labels to train.
△ Less
Submitted 20 September, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Quantization Guided JPEG Artifact Correction
Authors:
Max Ehrlich,
Larry Davis,
Ser-Nam Lim,
Abhinav Shrivastava
Abstract:
The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the curren…
▽ More
The JPEG image compression algorithm is the most popular method of image compression because of its ability for large compression ratios. However, to achieve such high compression, information is lost. For aggressive quantization settings, this leads to a noticeable reduction in image quality. Artifact correction has been studied in the context of deep neural networks for some time, but the current state-of-the-art methods require a different model to be trained for each quality setting, greatly limiting their practical application. We solve this problem by creating a novel architecture which is parameterized by the JPEG files quantization matrix. This allows our single model to achieve state-of-the-art performance over models trained for specific quality settings.
△ Less
Submitted 16 July, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Deep Residual Learning in the JPEG Transform Domain
Authors:
Max Ehrlich,
Larry Davis
Abstract:
We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. Our formulation leverages the linearity of the JPEG transform to redefine convolution and batch normalization with a tune-able numerical approximation for ReLu. The result is mathematically equivalent to the spatial domain n…
▽ More
We introduce a general method of performing Residual Network inference and learning in the JPEG transform domain that allows the network to consume compressed images as input. Our formulation leverages the linearity of the JPEG transform to redefine convolution and batch normalization with a tune-able numerical approximation for ReLu. The result is mathematically equivalent to the spatial domain network up to the ReLu approximation accuracy. A formulation for image classification and a model conversion algorithm for spatial domain networks are given as examples of the method. We show that the sparsity of the JPEG format allows for faster processing of images with little to no penalty in the network accuracy.
△ Less
Submitted 27 August, 2019; v1 submitted 30 December, 2018;
originally announced December 2018.
-
Action-Affect Classification and Morphing using Multi-Task Representation Learning
Authors:
Timothy J. Shields,
Mohamed R. Amer,
Max Ehrlich,
Amir Tamrakar
Abstract:
Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differe…
▽ More
Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differentiate between happy and sad and the macro variations between different actions. We depart from traditional approaches for time-series data analytics by proposing a multi-task learning model that learns a shared representation that is well-suited for action-affect classification as well as generation. For this paper we choose Conditional Restricted Boltzmann Machines to be our building block. We propose a new model that enhances the CRBM model with a factored multi-task component to become Multi-Task Conditional Restricted Boltzmann Machines (MTCRBMs). We evaluate our approach on two publicly available datasets, the Body Affect dataset and the Tower Game dataset, and show superior classification performance improvement over the state-of-the-art, as well as the generative abilities of our model.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.
-
Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms
Authors:
Mihai A. Petrovici,
Bernhard Vogginger,
Paul Müller,
Oliver Breitwieser,
Mikael Lundqvist,
Lyle Muller,
Matthias Ehrlich,
Alain Destexhe,
Anders Lansner,
René Schüffny,
Johannes Schemmel,
Karlheinz Meier
Abstract:
Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigat…
▽ More
Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigate the consequences of several such factors that are common to neuromorphic devices, more specifically limited hardware resources, limited parameter configurability and parameter variations. Our final aim is to provide an array of methods for co** with such inevitable distortion mechanisms. As a platform for testing our proposed strategies, we use an executable system specification (ESS) of the BrainScaleS neuromorphic system, which has been designed as a universal emulation back-end for neuroscientific modeling. We address the most essential limitations of this device in detail and study their effects on three prototypical benchmark network models within a well-defined, systematic workflow. For each network model, we start by defining quantifiable functionality measures by which we then assess the effects of typical hardware-specific distortion mechanisms, both in idealized software simulations and on the ESS. For those effects that cause unacceptable deviations from the original network dynamics, we suggest generic compensation mechanisms and demonstrate their effectiveness. Both the suggested workflow and the investigated compensation mechanisms are largely back-end independent and do not require additional hardware configurability beyond the one required to emulate the benchmark networks in the first place. We hereby provide a generic methodological environment for configurable neuromorphic devices that are targeted at emulating large-scale, functional neural networks.
△ Less
Submitted 10 February, 2015; v1 submitted 29 April, 2014;
originally announced April 2014.
-
A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems
Authors:
Daniel Brüderle,
Mihai A. Petrovici,
Bernhard Vogginger,
Matthias Ehrlich,
Thomas Pfeil,
Sebastian Millner,
Andreas Grübl,
Karsten Wendt,
Eric Müller,
Marc-Olivier Schwartz,
Dan Husmann de Oliveira,
Sebastian Jeltsch,
Johannes Fieres,
Moritz Schilling,
Paul Müller,
Oliver Breitwieser,
Venelin Petkov,
Lyle Muller,
Andrew P. Davison,
Pradeep Krishnamurthy,
Jens Kremkow,
Mikael Lundqvist,
Eilif Muller,
Johannes Partzsch,
Stefan Scholze
, et al. (9 additional authors not shown)
Abstract:
In this paper we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More spe…
▽ More
In this paper we present a methodological framework that meets novel requirements emerging from upcoming types of accelerated and highly configurable neuromorphic hardware systems. We describe in detail a device with 45 million programmable and dynamic synapses that is currently under development, and we sketch the conceptual challenges that arise from taking this platform into operation. More specifically, we aim at the establishment of this neuromorphic system as a flexible and neuroscientifically valuable modeling tool that can be used by non-hardware-experts. We consider various functional aspects to be crucial for this purpose, and we introduce a consistent workflow with detailed descriptions of all involved modules that implement the suggested steps: The integration of the hardware interface into the simulator-independent model description language PyNN; a fully automated translation between the PyNN domain and appropriate hardware configurations; an executable specification of the future neuromorphic system that can be seamlessly integrated into this biology-to-hardware map** process as a test bench for all software layers and possible hardware design modifications; an evaluation scheme that deploys models from a dedicated benchmark library, compares the results generated by virtual or prototype hardware devices with reference software simulations and analyzes the differences. The integration of these components into one hardware-software workflow provides an ecosystem for ongoing preparative studies that support the hardware design process and represents the basis for the maturity of the model-to-hardware map** software. The functionality and flexibility of the latter is proven with a variety of experimental results.
△ Less
Submitted 21 July, 2011; v1 submitted 12 November, 2010;
originally announced November 2010.