Search | arXiv e-print repository

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models

Authors: Lezhong Wang, Jeppe Revall Frisvad, Mark Bo Jensen, Siavash Arjomand Bigdeli

Abstract: The demand for stereo images increases as manufacturers launch more XR devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is trainning free, remarkably straightforward to use, and it seamlessly integrates into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight capability… ▽ More The demand for stereo images increases as manufacturers launch more XR devices. To meet this demand, we introduce StereoDiffusion, a method that, unlike traditional inpainting pipelines, is trainning free, remarkably straightforward to use, and it seamlessly integrates into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs, without the need for fine-tuning model weights or any post-processing of images. Using the original input to generate a left image and estimate a disparity map for it, we generate the latent vector for the right image through Stereo Pixel Shift operations, complemented by Symmetric Pixel Shift Masking Denoise and Self-Attention Layers Modification methods to align the right-side image with the left-side image. Moreover, our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations. △ Less

Submitted 2 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Updated to CVPR 2024 GCV accepted version

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 7416-7425

arXiv:2204.01460 [pdf, other]

Optimizing the Consumption of Spiking Neural Networks with Activity Regularization

Authors: Simon Narduzzi, Siavash A. Bigdeli, Shih-Chii Liu, L. Andrea Dunbar

Abstract: Reducing energy consumption is a critical point for neural network models running on edge devices. In this regard, reducing the number of multiply-accumulate (MAC) operations of Deep Neural Networks (DNNs) running on edge hardware accelerators will reduce the energy consumption during inference. Spiking Neural Networks (SNNs) are an example of bio-inspired techniques that can further save energy b… ▽ More Reducing energy consumption is a critical point for neural network models running on edge devices. In this regard, reducing the number of multiply-accumulate (MAC) operations of Deep Neural Networks (DNNs) running on edge hardware accelerators will reduce the energy consumption during inference. Spiking Neural Networks (SNNs) are an example of bio-inspired techniques that can further save energy by using binary activations, and avoid consuming energy when not spiking. The networks can be configured for equivalent accuracy on a task through DNN-to-SNN conversion frameworks but their conversion is based on rate coding therefore the synaptic operations can be high. In this work, we look into different techniques to enforce sparsity on the neural network activation maps and compare the effect of different training regularizers on the efficiency of the optimized DNNs and SNNs. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 5 pages, 3 figures; accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022

arXiv:2008.11010 [pdf, other]

doi 10.1109/SDS49233.2020.00022

Efficient Blind-Spot Neural Network Architecture for Image Denoising

Authors: David Honzátko, Siavash A. Bigdeli, Engin Türetken, L. Andrea Dunbar

Abstract: Image denoising is an essential tool in computational photography. Standard denoising techniques, which use deep neural networks at their core, require pairs of clean and noisy images for its training. If we do not possess the clean samples, we can use blind-spot neural network architectures, which estimate the pixel value based on the neighbouring pixels only. These networks thus allow training o… ▽ More Image denoising is an essential tool in computational photography. Standard denoising techniques, which use deep neural networks at their core, require pairs of clean and noisy images for its training. If we do not possess the clean samples, we can use blind-spot neural network architectures, which estimate the pixel value based on the neighbouring pixels only. These networks thus allow training on noisy images directly, as they by-design avoid trivial solutions. Nowadays, the blind-spot is mostly achieved using shifted convolutions or serialization. We propose a novel fully convolutional network architecture that uses dilations to achieve the blind-spot property. Our network improves the performance over the prior work and achieves state-of-the-art results on established datasets. △ Less

Submitted 25 August, 2020; originally announced August 2020.

ACM Class: I.4.3; I.2.10

Journal ref: 2020 7th Swiss Conference on Data Science (SDS), Luzern, Switzerland, 2020, pp. 59-60

arXiv:2001.02728 [pdf, other]

Learning Generative Models using Denoising Density Estimators

Authors: Siavash A. Bigdeli, Geng Lin, Tiziano Portenier, L. Andrea Dunbar, Matthias Zwicker

Abstract: Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density es… ▽ More Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parameterized by neural networks, that are efficiently trained to represent kernel density estimators of the data. Leveraging DDEs, our main contribution is a novel technique to obtain generative models by minimizing the KL-divergence directly. We prove that our algorithm for obtaining generative models is guaranteed to converge to the correct solution. Our approach does not require specific network architecture as in normalizing flows, nor use ordinary differential equation solvers as in continuous normalizing flows. Experimental results demonstrate substantial improvement in density estimation and competitive performance in generative model training. △ Less

Submitted 9 June, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

Comments: Code and models available at https://drive.google.com/file/d/1EzKRxnFG1Hd8g6Ggvt-jvKkgpDDwK2bY

arXiv:1810.03372 [pdf, other]

Detecting Memorization in ReLU Networks

Authors: Edo Collins, Siavash Arjomand Bigdeli, Sabine Süsstrunk

Abstract: We propose a new notion of `non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix. We measure this non-linearity by applying non-negative factorization to the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is i… ▽ More We propose a new notion of `non-linearity' of a network layer with respect to an input batch that is based on its proximity to a linear system, which is reflected in the non-negative rank of the activation matrix. We measure this non-linearity by applying non-negative factorization to the activation matrix. Considering batches of similar samples, we find that high non-linearity in deep layers is indicative of memorization. Furthermore, by applying our approach layer-by-layer, we find that the mechanism for memorization consists of distinct phases. We perform experiments on fully-connected and convolutional neural networks trained on several image and audio datasets. Our results demonstrate that as an indicator for memorization, our technique can be used to perform early stop**. △ Less

Submitted 8 October, 2018; originally announced October 2018.

arXiv:1804.08972 [pdf, other]

FaceShop: Deep Sketch-based Face Image Editing

Authors: Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, Matthias Zwicker

Abstract: We present a novel system for sketch-based face image editing, enabling users to edit images intuitively by sketching a few strokes on a region of interest. Our interface features tools to express a desired image manipulation by providing both geometry and color constraints as user-drawn strokes. As an alternative to the direct user input, our proposed system naturally supports a copy-paste mode,… ▽ More We present a novel system for sketch-based face image editing, enabling users to edit images intuitively by sketching a few strokes on a region of interest. Our interface features tools to express a desired image manipulation by providing both geometry and color constraints as user-drawn strokes. As an alternative to the direct user input, our proposed system naturally supports a copy-paste mode, which allows users to edit a given image region by using parts of another exemplar image without the need of hand-drawn sketching at all. The proposed interface runs in real-time and facilitates an interactive and iterative workflow to quickly express the intended edits. Our system is based on a novel sketch domain and a convolutional neural network trained end-to-end to automatically learn to render image regions corresponding to the input strokes. To achieve high quality and semantically consistent results we train our neural network on two simultaneous tasks, namely image completion and image translation. To the best of our knowledge, we are the first to combine these two tasks in a unified framework for interactive image editing. Our results show that the proposed sketch domain, network architecture, and training procedure generalize well to real user input and enable high quality synthesis results without additional post-processing. △ Less

Submitted 7 June, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: 13 pages, 20 figures

arXiv:1709.03749 [pdf, other]

Deep Mean-Shift Priors for Image Restoration

Authors: Siavash Arjomand Bigdeli, Meiguang **, Paolo Favaro, Matthias Zwicker

Abstract: In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In… ▽ More In this paper we introduce a natural image prior that directly represents a Gaussian-smoothed version of the natural image distribution. We include our prior in a formulation of image restoration as a Bayes estimator that also allows us to solve noise-blind image restoration problems. We show that the gradient of our prior corresponds to the mean-shift vector on the natural image distribution. In addition, we learn the mean-shift vector field using denoising autoencoders, and use it in a gradient descent approach to perform Bayes risk minimization. We demonstrate competitive results for noise-blind deblurring, super-resolution, and demosaicing. △ Less

Submitted 4 October, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

Comments: NIPS 2017

arXiv:1703.09964 [pdf, other]

Image Restoration using Autoencoding Priors

Authors: Siavash Arjomand Bigdeli, Matthias Zwicker

Abstract: We propose to leverage denoising autoencoder networks as priors to address image restoration problems. We build on the key observation that the output of an optimal denoising autoencoder is a local mean of the true data density, and the autoencoder error (the difference between the output and input of the trained autoencoder) is a mean shift vector. We use the magnitude of this mean shift vector,… ▽ More We propose to leverage denoising autoencoder networks as priors to address image restoration problems. We build on the key observation that the output of an optimal denoising autoencoder is a local mean of the true data density, and the autoencoder error (the difference between the output and input of the trained autoencoder) is a mean shift vector. We use the magnitude of this mean shift vector, that is, the distance to the local mean, as the negative log likelihood of our natural image prior. For image restoration, we maximize the likelihood using gradient descent by backpropagating the autoencoder error. A key advantage of our approach is that we do not need to train separate networks for different image restoration tasks, such as non-blind deconvolution with different kernels, or super-resolution at different magnification factors. We demonstrate state of the art results for non-blind deconvolution and super-resolution using the same autoencoding prior. △ Less

Submitted 29 March, 2017; originally announced March 2017.

Showing 1–8 of 8 results for author: Bigdeli, S A