Search | arXiv e-print repository

Exploiting the Signal-Leak Bias in Diffusion Models

Authors: Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, Radhakrishna Achanta

Abstract: There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to… ▽ More There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training. △ Less

Submitted 24 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: corrected the author names in reference [24]

arXiv:2110.03575 [pdf, other]

Estimating Image Depth in the Comics Domain

Authors: Deblina Bhattacharjee, Martin Everaert, Mathieu Salzmann, Sabine Süsstrunk

Abstract: Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth… ▽ More Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics. Our project website can be accessed at https://github.com/IVRL/ComicsDepth. △ Less

Submitted 15 August, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Accepted to WACV 2022 : Winter Conference on Applications of Computer Vision

arXiv:2006.02333 [pdf, other]

Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme

Authors: Alexandre Pierre Dherse, Martin Nicolas Everaert, Jakub Jan Gwizdała

Abstract: The image relighting task of transferring illumination conditions between two images offers an interesting and difficult challenge with potential applications in photography, cinematography and computer graphics. In this report we present methods that we tried to achieve that goal. Our models are trained on a rendered dataset of artificial locations with varied scene content, light source location… ▽ More The image relighting task of transferring illumination conditions between two images offers an interesting and difficult challenge with potential applications in photography, cinematography and computer graphics. In this report we present methods that we tried to achieve that goal. Our models are trained on a rendered dataset of artificial locations with varied scene content, light source location and color temperature. With this dataset, we used a network with illumination estimation component aiming to infer and replace light conditions in the latent space representation of the concerned scenes. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: Report for the CS-413 project at EPFL, Switzerland

Showing 1–3 of 3 results for author: Everaert, M