-
Exploiting the Signal-Leak Bias in Diffusion Models
Authors:
Martin Nicolas Everaert,
Athanasios Fitsios,
Marco Bocchio,
Sami Arpa,
Sabine Süsstrunk,
Radhakrishna Achanta
Abstract:
There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to…
▽ More
There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training.
△ Less
Submitted 24 October, 2023; v1 submitted 27 September, 2023;
originally announced September 2023.
-
What You See is What You Classify: Black Box Attributions
Authors:
Steven Stalder,
Nathanaël Perraudin,
Radhakrishna Achanta,
Fernando Perez-Cruz,
Michele Volpi
Abstract:
An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We inst…
▽ More
An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are provided in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundary-precise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks in a single forward pass. This makes the proposed method very efficient during inference. We show that our attributions are superior to established methods both visually and quantitatively with respect to the PASCAL VOC-2007 and Microsoft COCO-2014 datasets.
△ Less
Submitted 7 October, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Uncertainty Surrogates for Deep Learning
Authors:
Radhakrishna Achanta,
Natasa Tagasovska
Abstract:
In this paper we introduce a novel way of estimating prediction uncertainty in deep networks through the use of uncertainty surrogates. These surrogates are features of the penultimate layer of a deep network that are forced to match predefined patterns. The patterns themselves can be, among other possibilities, a known visual symbol. We show how our approach can be used for estimating uncertainty…
▽ More
In this paper we introduce a novel way of estimating prediction uncertainty in deep networks through the use of uncertainty surrogates. These surrogates are features of the penultimate layer of a deep network that are forced to match predefined patterns. The patterns themselves can be, among other possibilities, a known visual symbol. We show how our approach can be used for estimating uncertainty in prediction and out-of-distribution detection. Additionally, the surrogates allow for interpretability of the ability of the deep network to learn and at the same time lend robustness against adversarial attacks. Despite its simplicity, our approach is superior to the state-of-the-art approaches on standard metrics as well as computational efficiency and ease of implementation. A wide range of experiments are performed on standard datasets to prove the efficacy of our approach.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Image Obfuscation for Privacy-Preserving Machine Learning
Authors:
Mathilde Raynal,
Radhakrishna Achanta,
Mathias Humbert
Abstract:
Privacy becomes a crucial issue when outsourcing the training of machine learning (ML) models to cloud-based platforms offering machine-learning services. While solutions based on cryptographic primitives have been developed, they incur a significant loss in accuracy or training efficiency, and require modifications to the backend architecture. A key challenge we tackle in this paper is the design…
▽ More
Privacy becomes a crucial issue when outsourcing the training of machine learning (ML) models to cloud-based platforms offering machine-learning services. While solutions based on cryptographic primitives have been developed, they incur a significant loss in accuracy or training efficiency, and require modifications to the backend architecture. A key challenge we tackle in this paper is the design of image obfuscation schemes that provide enough privacy without significantly degrading the accuracy of the ML model and the efficiency of the training process. In this endeavor, we address another challenge that has persisted so far: quantifying the degree of privacy provided by visual obfuscation mechanisms. We compare the ability of state-of-the-art full-reference quality metrics to concur with human subjects in terms of the degree of obfuscation introduced by a range of techniques. By relying on user surveys and two image datasets, we show that two existing image quality metrics are also well suited to measure the level of privacy in accordance with human subjects as well as AI-based recognition, and can therefore be used for quantifying privacy resulting from obfuscation. With the ability to quantify privacy, we show that we can provide adequate privacy protection to the training image set at the cost of only a few percentage points loss in accuracy.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Self-Binarizing Networks
Authors:
Fayez Lahoud,
Radhakrishna Achanta,
Pablo Márquez-Neila,
Sabine Süsstrunk
Abstract:
We present a method to train self-binarizing neural networks, that is, networks that evolve their weights and activations during training to become binary. To obtain similar binary networks, existing methods rely on the sign activation function. This function, however, has no gradients for non-zero values, which makes standard backpropagation impossible. To circumvent the difficulty of training a…
▽ More
We present a method to train self-binarizing neural networks, that is, networks that evolve their weights and activations during training to become binary. To obtain similar binary networks, existing methods rely on the sign activation function. This function, however, has no gradients for non-zero values, which makes standard backpropagation impossible. To circumvent the difficulty of training a network relying on the sign activation function, these methods alternate between floating-point and binary representations of the network during training, which is sub-optimal and inefficient. We approach the binarization task by training on a unique representation involving a smooth activation function, which is iteratively sharpened during training until it becomes a binary representation equivalent to the sign activation function. Additionally, we introduce a new technique to perform binary batch normalization that simplifies the conventional batch normalization by transforming it into a simple comparison operation. This is unlike existing methods, which are forced to the retain the conventional floating-point-based batch normalization. Our binary networks, apart from displaying advantages of lower memory and computation as compared to conventional floating-point and binary networks, also show higher classification accuracy than existing state-of-the-art methods on multiple benchmark datasets.
△ Less
Submitted 2 February, 2019;
originally announced February 2019.
-
Fourier-Domain Optimization for Image Processing
Authors:
Majed El Helou,
Frederike Dümbgen,
Radhakrishna Achanta,
Sabine Süsstrunk
Abstract:
Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others. With current image sizes in the order of megabytes, it is extremely expensive to run conventional algorithms such as gradient descent, making them unfavorable especially when closed-form solutions can be derived and…
▽ More
Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others. With current image sizes in the order of megabytes, it is extremely expensive to run conventional algorithms such as gradient descent, making them unfavorable especially when closed-form solutions can be derived and computed efficiently. This paper explains in detail the framework for solving convex image optimization and deconvolution in the Fourier domain. We begin by explaining the mathematical background and motivating why the presented setups can be transformed and solved very efficiently in the Fourier domain. We also show how to practically use these solutions, by providing the corresponding implementations. The explanations are aimed at a broad audience with minimal knowledge of convolution and image optimization. The eager reader can jump to Section 3 for a footprint of how to solve and implement a sample optimization function, and Section 5 for the more complex cases.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Deep Feature Factorization For Concept Discovery
Authors:
Edo Collins,
Radhakrishna Achanta,
Sabine Süsstrunk
Abstract:
We propose Deep Feature Factorization (DFF), a method capable of localizing similar semantic concepts within an image or a set of images. We use DFF to gain insight into a deep convolutional neural network's learned features, where we detect hierarchical cluster structures in feature space. This is visualized as heat maps, which highlight semantically matching regions across a set of images, revea…
▽ More
We propose Deep Feature Factorization (DFF), a method capable of localizing similar semantic concepts within an image or a set of images. We use DFF to gain insight into a deep convolutional neural network's learned features, where we detect hierarchical cluster structures in feature space. This is visualized as heat maps, which highlight semantically matching regions across a set of images, revealing what the network `perceives' as similar. DFF can also be used to perform co-segmentation and co-localization, and we report state-of-the-art results on these tasks.
△ Less
Submitted 8 October, 2018; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Deep Residual Network for Joint Demosaicing and Super-Resolution
Authors:
Ruofan Zhou,
Radhakrishna Achanta,
Sabine Süsstrunk
Abstract:
In digital photography, two image restoration tasks have been studied extensively and resolved independently: demosaicing and super-resolution. Both these tasks are related to resolution limitations of the camera. Performing super-resolution on a demosaiced images simply exacerbates the artifacts introduced by demosaicing. In this paper, we show that such accumulation of errors can be easily avert…
▽ More
In digital photography, two image restoration tasks have been studied extensively and resolved independently: demosaicing and super-resolution. Both these tasks are related to resolution limitations of the camera. Performing super-resolution on a demosaiced images simply exacerbates the artifacts introduced by demosaicing. In this paper, we show that such accumulation of errors can be easily averted by jointly performing demosaicing and super-resolution. To this end, we propose a deep residual network for learning an end-to-end map** between Bayer images and high-resolution images. By training on high-quality samples, our deep residual demosaicing and super-resolution network is able to recover high-quality super-resolved images from low-resolution Bayer mosaics in a single step without producing the artifacts common to such processing when the two operations are done separately. We perform extensive experiments to show that our deep residual network achieves demosaiced and super-resolved images that are superior to the state-of-the-art both qualitatively and in terms of PSNR and SSIM metrics.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
Extrapolating Expected Accuracies for Large Multi-Class Problems
Authors:
Charles Zheng,
Rakesh Achanta,
Yuval Benjamini
Abstract:
The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumptions that the classes are sampled identically and independently from a population, and that the classifier is based on independently learned scoring functions, we s…
▽ More
The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumptions that the classes are sampled identically and independently from a population, and that the classifier is based on independently learned scoring functions, we show that the expected accuracy when the classifier is trained on k classes is the (k-1)st moment of a certain distribution that can be estimated from data. We present an unbiased estimation method based on the theory, and demonstrate its application on a facial recognition example.
△ Less
Submitted 27 December, 2017;
originally announced December 2017.
-
Uniform Information Segmentation
Authors:
Radhakrishna Achanta,
Pablo Márquez-Neila,
Pascal Fua,
Sabine Süsstrunk
Abstract:
Size uniformity is one of the main criteria of superpixel methods. But size uniformity rarely conforms to the varying content of an image. The chosen size of the superpixels therefore represents a compromise - how to obtain the fewest superpixels without losing too much important detail. We propose that a more appropriate criterion for creating image segments is information uniformity. We introduc…
▽ More
Size uniformity is one of the main criteria of superpixel methods. But size uniformity rarely conforms to the varying content of an image. The chosen size of the superpixels therefore represents a compromise - how to obtain the fewest superpixels without losing too much important detail. We propose that a more appropriate criterion for creating image segments is information uniformity. We introduce a novel method for segmenting an image based on this criterion. Since information is a natural way of measuring image complexity, our proposed algorithm leads to image segments that are smaller and denser in areas of high complexity and larger in homogeneous regions, thus simplifying the image while preserving its details. Our algorithm is simple and requires just one input parameter - a threshold on the information content. On segmentation comparison benchmarks it proves to be superior to the state-of-the-art. In addition, our method is computationally very efficient, approaching real-time performance, and is easily extensible to three-dimensional image stacks and video volumes.
△ Less
Submitted 27 November, 2016;
originally announced November 2016.
-
How many faces can be recognized? Performance extrapolation for multi-class classification
Authors:
Charles Y. Zheng,
Rakesh Achanta,
Yuval Benjamini
Abstract:
The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumption that the classes are sampled exchangeably, and under the assumption that the classifier is generative (e.g. QDA or Naive Bayes), we show that the expected accur…
▽ More
The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumption that the classes are sampled exchangeably, and under the assumption that the classifier is generative (e.g. QDA or Naive Bayes), we show that the expected accuracy when the classifier is trained on $k$ classes is the $k-1$st moment of a \emph{conditional accuracy distribution}, which can be estimated from data. This provides the theoretical foundation for performance extrapolation based on pseudolikelihood, unbiased estimation, and high-dimensional asymptotics. We investigate the robustness of our methods to non-generative classifiers in simulations and one optical character recognition example.
△ Less
Submitted 16 June, 2016;
originally announced June 2016.
-
Telugu OCR Framework using Deep Learning
Authors:
Rakesh Achanta,
Trevor Hastie
Abstract:
In this paper, we address the task of Optical Character Recognition(OCR) for the Telugu script. We present an end-to-end framework that segments the text image, classifies the characters and extracts lines using a language model. The segmentation is based on mathematical morphology. The classification module, which is the most challenging task of the three, is a deep convolutional neural network.…
▽ More
In this paper, we address the task of Optical Character Recognition(OCR) for the Telugu script. We present an end-to-end framework that segments the text image, classifies the characters and extracts lines using a language model. The segmentation is based on mathematical morphology. The classification module, which is the most challenging task of the three, is a deep convolutional neural network. The language is modelled as a third degree markov chain at the glyph level. Telugu script is a complex alphasyllabary and the language is agglutinative, making the problem hard. In this paper we apply the latest advances in neural networks to achieve state-of-the-art error rates. We also review convolutional neural networks in great detail and expound the statistical justification behind the many tricks needed to make Deep Learning work.
△ Less
Submitted 14 February, 2017; v1 submitted 19 September, 2015;
originally announced September 2015.