Search | arXiv e-print repository

Facial Image Feature Analysis and its Specialization for Fréchet Distance and Neighborhoods

Authors: Doruk Cetin, Benedikt Schesch, Petar Stamenkovic, Niko Benjamin Huber, Fabio Zünd, Majed El Helou

Abstract: Assessing distances between images and image datasets is a fundamental task in vision-based research. It is a challenging open problem in the literature and despite the criticism it receives, the most ubiquitous method remains the Fréchet Inception Distance. The Inception network is trained on a specific labeled dataset, ImageNet, which has caused the core of its criticism in the most recent resea… ▽ More Assessing distances between images and image datasets is a fundamental task in vision-based research. It is a challenging open problem in the literature and despite the criticism it receives, the most ubiquitous method remains the Fréchet Inception Distance. The Inception network is trained on a specific labeled dataset, ImageNet, which has caused the core of its criticism in the most recent research. Improvements were shown by moving to self-supervision learning over ImageNet, leaving the training data domain as an open question. We make that last leap and provide the first analysis on domain-specific feature training and its effects on feature distance, on the widely-researched facial image domain. We provide our findings and insights on this domain specialization for Fréchet distance and image neighborhoods, supported by extensive experiments and in-depth user studies. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2405.14470 [pdf, other]

Which Information Matters? Dissecting Human-written Multi-document Summaries with Partial Information Decomposition

Authors: Laura Mascarell, Yan L'Homme, Majed El Helou

Abstract: Understanding the nature of high-quality summaries is crucial to further improve the performance of multi-document summarization. We propose an approach to characterize human-written summaries using partial information decomposition, which decomposes the mutual information provided by all source documents into union, redundancy, synergy, and unique information. Our empirical analysis on different… ▽ More Understanding the nature of high-quality summaries is crucial to further improve the performance of multi-document summarization. We propose an approach to characterize human-written summaries using partial information decomposition, which decomposes the mutual information provided by all source documents into union, redundancy, synergy, and unique information. Our empirical analysis on different MDS datasets shows that there is a direct dependency between the number of sources and their contribution to the summary. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2312.02124 [pdf, other]

VerA: Versatile Anonymization Fit for Clinical Facial Images

Authors: Majed El Helou, Doruk Cetin, Petar Stamenkovic, Fabio Zund

Abstract: The escalating legislative demand for data privacy in facial image dissemination has underscored the significance of image anonymization. Recent advancements in the field surpass traditional pixelation or blur methods, yet they predominantly address regular single images. This leaves clinical image anonymization -- a necessity for illustrating medical interventions -- largely unaddressed. We prese… ▽ More The escalating legislative demand for data privacy in facial image dissemination has underscored the significance of image anonymization. Recent advancements in the field surpass traditional pixelation or blur methods, yet they predominantly address regular single images. This leaves clinical image anonymization -- a necessity for illustrating medical interventions -- largely unaddressed. We present VerA, a versatile facial image anonymization that is fit for clinical facial images where: (1) certain semantic areas must be preserved to show medical intervention results, and (2) anonymizing image pairs is crucial for showing before-and-after results. VerA outperforms or is on par with state-of-the-art methods in de-identification and photorealism for regular images. In addition, we validate our results on paired anonymization, and on the anonymization of both single and paired clinical images with extensive quantitative and qualitative evaluation. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2306.14891 [pdf, other]

Fuzzy-Conditioned Diffusion and Diffusion Projection Attention Applied to Facial Image Correction

Authors: Majed El Helou

Abstract: Image diffusion has recently shown remarkable performance in image synthesis and implicitly as an image prior. Such a prior has been used with conditioning to solve the inpainting problem, but only supporting binary user-based conditioning. We derive a fuzzy-conditioned diffusion, where implicit diffusion priors can be exploited with controllable strength. Our fuzzy conditioning can be applied pix… ▽ More Image diffusion has recently shown remarkable performance in image synthesis and implicitly as an image prior. Such a prior has been used with conditioning to solve the inpainting problem, but only supporting binary user-based conditioning. We derive a fuzzy-conditioned diffusion, where implicit diffusion priors can be exploited with controllable strength. Our fuzzy conditioning can be applied pixel-wise, enabling the modification of different image components to varying degrees. Additionally, we propose an application to facial image correction, where we combine our fuzzy-conditioned diffusion with diffusion-derived attention maps. Our map estimates the degree of anomaly, and we obtain it by projecting on the diffusion space. We show how our approach also leads to interpretable and autonomous facial image correction. △ Less

Submitted 1 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Code available on https://github.com/majedelhelou/FC-Diffusion

arXiv:2210.04866 [pdf, other]

doi 10.1109/LSP.2022.3227522

PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Authors: Nicolas Bähler, Majed El Helou, Étienne Objois, Kaan Okumuş, Sabine Süsstrunk

Abstract: Image noise can often be accurately fitted to a Poisson-Gaussian distribution. However, estimating the distribution parameters from a noisy image only is a challenging task. Here, we study the case when paired noisy and noise-free samples are accessible. No method is currently available to exploit the noise-free information, which may help to achieve more accurate estimations. To fill this gap, we… ▽ More Image noise can often be accurately fitted to a Poisson-Gaussian distribution. However, estimating the distribution parameters from a noisy image only is a challenging task. Here, we study the case when paired noisy and noise-free samples are accessible. No method is currently available to exploit the noise-free information, which may help to achieve more accurate estimations. To fill this gap, we derive a novel, cumulant-based, approach for Poisson-Gaussian noise modeling from paired image samples. We show its improved performance over different baselines, with special emphasis on MSE, effect of outliers, image dependence, and bias. We additionally derive the log-likelihood function for further insights and discuss real-world applicability. △ Less

Submitted 19 December, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures, and 3 tables. Code is available at https://github.com/IVRL/PoGaIN

arXiv:2208.12327 [pdf, other]

DSR: Towards Drone Image Super-Resolution

Authors: Xiaoyu Lin, Baran Ozaydin, Vidit Vidit, Majed El Helou, Sabine Süsstrunk

Abstract: Despite achieving remarkable progress in recent years, single-image super-resolution methods are developed with several limitations. Specifically, they are trained on fixed content domains with certain degradations (whether synthetic or real). The priors they learn are prone to overfitting the training configuration. Therefore, the generalization to novel domains such as drone top view data, and a… ▽ More Despite achieving remarkable progress in recent years, single-image super-resolution methods are developed with several limitations. Specifically, they are trained on fixed content domains with certain degradations (whether synthetic or real). The priors they learn are prone to overfitting the training configuration. Therefore, the generalization to novel domains such as drone top view data, and across altitudes, is currently unknown. Nonetheless, pairing drones with proper image super-resolution is of great value. It would enable drones to fly higher covering larger fields of view, while maintaining a high image quality. To answer these questions and pave the way towards drone image super-resolution, we explore this application with particular focus on the single-image case. We propose a novel drone image dataset, with scenes captured at low and high resolutions, and across a span of altitudes. Our results show that off-the-shelf state-of-the-art networks witness a significant drop in performance on this different domain. We additionally show that simple fine-tuning, and incorporating altitude awareness into the network's architecture, both improve the reconstruction performance. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Accepted at ECCVW 2022

arXiv:2201.00429 [pdf, other]

Image Denoising with Control over Deep Network Hallucination

Authors: Qiyuan Liang, Florian Cassayre, Haley Owsianko, Majed El Helou, Sabine Süsstrunk

Abstract: Deep image denoisers achieve state-of-the-art results but with a hidden cost. As witnessed in recent literature, these deep networks are capable of overfitting their training distributions, causing inaccurate hallucinations to be added to the output and generalizing poorly to varying data. For better control and interpretability over a deep denoiser, we propose a novel framework exploiting a denoi… ▽ More Deep image denoisers achieve state-of-the-art results but with a hidden cost. As witnessed in recent literature, these deep networks are capable of overfitting their training distributions, causing inaccurate hallucinations to be added to the output and generalizing poorly to varying data. For better control and interpretability over a deep denoiser, we propose a novel framework exploiting a denoising network. We call it controllable confidence-based image denoising (CCID). In this framework, we exploit the outputs of a deep denoising network alongside an image convolved with a reliable filter. Such a filter can be a simple convolution kernel which does not risk adding hallucinated information. We propose to fuse the two components with a frequency-domain approach that takes into account the reliability of the deep network outputs. With our framework, the user can control the fusion of the two components in the frequency domain. We also provide a user-friendly map estimating spatially the confidence in the output that potentially contains network hallucination. Results show that our CCID not only provides more interpretability and control, but can even outperform both the quantitative performance of the deep denoiser and that of the reliable filter, especially when the test data diverge from the training data. △ Less

Submitted 2 January, 2022; originally announced January 2022.

Comments: Published in Electronic Imaging 2022, code available at https://github.com/IVRL/CCID

arXiv:2106.00673 [pdf, other]

doi 10.1109/LSP.2021.3104769

Fidelity Estimation Improves Noisy-Image Classification With Pretrained Networks

Authors: Xiaoyu Lin, Deblina Bhattacharjee, Majed El Helou, Sabine Süsstrunk

Abstract: Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods a… ▽ More Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a $pretrained$ classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images. △ Less

Submitted 4 October, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: Published in IEEE Signal Processing Letters

Journal ref: IEEE Signal Processing Letters 28 (2021) 1719 - 1723

arXiv:2104.13365 [pdf, other]

NTIRE 2021 Depth Guided Image Relighting Challenge

Authors: Majed El Helou, Ruofan Zhou, Sabine Susstrunk, Radu Timofte

Abstract: Image relighting is attracting increasing interest due to its various applications. From a research perspective, image relighting can be exploited to conduct both image normalization for domain adaptation, and also for data augmentation. It also has multiple direct uses for photo montage and aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided image relighting challenge. W… ▽ More Image relighting is attracting increasing interest due to its various applications. From a research perspective, image relighting can be exploited to conduct both image normalization for domain adaptation, and also for data augmentation. It also has multiple direct uses for photo montage and aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided image relighting challenge. We rely on the VIDIT dataset for each of our two challenge tracks, including depth information. The first track is on one-to-one relighting where the goal is to transform the illumination setup of an input image (color temperature and light source position) to the target illumination setup. In the second track, the any-to-any relighting challenge, the objective is to transform the illumination settings of the input image to match those of another guide image, similar to style transfer. In both tracks, participants were given depth information about the captured scenes. We had nearly 250 registered participants, leading to 18 confirmed team submissions in the final competition stage. The competitions, methods, and final results are presented in this paper. △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: Code and data available on https://github.com/majedelhelou/VIDIT

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition Workshops 2021

arXiv:2101.04631 [pdf, other]

Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Authors: Xiaoqi Ma, Xiaoyu Lin, Majed El Helou, Sabine Süsstrunk

Abstract: Following the performance breakthrough of denoising networks, improvements have come chiefly through novel architecture designs and increased depth. While novel denoising networks were designed for real images coming from different distributions, or for specific applications, comparatively small improvement was achieved on Gaussian denoising. The denoising solutions suffer from epistemic uncertain… ▽ More Following the performance breakthrough of denoising networks, improvements have come chiefly through novel architecture designs and increased depth. While novel denoising networks were designed for real images coming from different distributions, or for specific applications, comparatively small improvement was achieved on Gaussian denoising. The denoising solutions suffer from epistemic uncertainty that can limit further advancements. This uncertainty is traditionally mitigated through different ensemble approaches. However, such ensembles are prohibitively costly with deep networks, which are already large in size. Our work focuses on pushing the performance limits of state-of-the-art methods on Gaussian denoising. We propose a model-agnostic approach for reducing epistemic uncertainty while using only a single pretrained network. We achieve this by tap** into the epistemic uncertainty through augmented and frequency-manipulated images to obtain denoised images with varying error. We propose an ensemble method with two decoupled attention paths, over the pixel domain and over that of our different manipulations, to learn the final fusion. Our results significantly improve over the state-of-the-art baselines and across varying noise levels. △ Less

Submitted 31 May, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: Code and models are publicly available on https://github.com/IVRL/DEU

arXiv:2011.01406 [pdf, other]

BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Authors: Majed El Helou, Sabine Süsstrunk

Abstract: Classic image-restoration algorithms use a variety of priors, either implicitly or explicitly. Their priors are hand-designed and their corresponding weights are heuristically assigned. Hence, deep learning methods often produce superior image restoration quality. Deep networks are, however, capable of inducing strong and hardly predictable hallucinations. Networks implicitly learn to be jointly f… ▽ More Classic image-restoration algorithms use a variety of priors, either implicitly or explicitly. Their priors are hand-designed and their corresponding weights are heuristically assigned. Hence, deep learning methods often produce superior image restoration quality. Deep networks are, however, capable of inducing strong and hardly predictable hallucinations. Networks implicitly learn to be jointly faithful to the observed data while learning an image prior; and the separation of original data and hallucinated data downstream is then not possible. This limits their wide-spread adoption in image restoration. Furthermore, it is often the hallucinated part that is victim to degradation-model overfitting. We present an approach with decoupled network-prior based hallucination and data fidelity terms. We refer to our framework as the Bayesian Integration of a Generative Prior (BIGPrior). Our method is rooted in a Bayesian framework and tightly connected to classic restoration methods. In fact, it can be viewed as a generalization of a large family of classic restoration algorithms. We use network inversion to extract image prior information from a generative network. We show that, on image colorization, inpainting and denoising, our framework consistently improves the inversion results. Our method, though partly reliant on the quality of the generative network inversion, is competitive with state-of-the-art supervised and task-specific restoration methods. It also provides an additional metric that sets forth the degree of prior reliance per pixel relative to data fidelity. △ Less

Submitted 8 January, 2022; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: IEEE TIP 2022. Code available on https://github.com/majedelhelou/BIGPrior. Main change relative to v1: added Table VI and computation times

arXiv:2009.12798 [pdf, other]

AIM 2020: Scene Relighting and Illumination Estimation Challenge

Authors: Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang, Zhi-Song Liu, Chu-Tak Li, Sourya Dipta Das, Nisarg A. Shah, Akashdeep Jassal, Tongtong Zhao, Shanshan Zhao, Sabari Nathan, M. Parisa Beham, R. Suganya, Qing Wang, Zhongyun Hu, Xin Huang, Yaning Li, Maitreya Suin , et al. (12 additional authors not shown)

Abstract: We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illum… ▽ More We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i.e., light source position). The goal of the second track was to estimate illumination settings, namely the color temperature and orientation, from a given image. Lastly, the third track dealt with any-to-any relighting, thus a generalization of the first track. The target color temperature and orientation, rather than being pre-determined, are instead given by a guide image. Participants were allowed to make use of their track 1 and 2 solutions for track 3. The tracks had 94, 52, and 56 registered participants, respectively, leading to 20 confirmed submissions in the final competition stage. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: ECCVW 2020. Data and more information on https://github.com/majedelhelou/VIDIT

arXiv:2005.05460 [pdf, other]

VIDIT: Virtual Image Dataset for Illumination Transfer

Authors: Majed El Helou, Ruofan Zhou, Johan Barthas, Sabine Süsstrunk

Abstract: Deep image relighting is gaining more interest lately, as it allows photo enhancement through illumination-specific retouching without human effort. Aside from aesthetic enhancement and photo montage, image relighting is valuable for domain adaptation, whether to augment datasets for training or to normalize input test data. Accurate relighting is, however, very challenging for various reasons, su… ▽ More Deep image relighting is gaining more interest lately, as it allows photo enhancement through illumination-specific retouching without human effort. Aside from aesthetic enhancement and photo montage, image relighting is valuable for domain adaptation, whether to augment datasets for training or to normalize input test data. Accurate relighting is, however, very challenging for various reasons, such as the difficulty in removing and recasting shadows and the modeling of different surfaces. We present a novel dataset, the Virtual Image Dataset for Illumination Transfer (VIDIT), in an effort to create a reference evaluation benchmark and to push forward the development of illumination manipulation methods. Virtual datasets are not only an important step towards achieving real-image performance but have also proven capable of improving training even when real datasets are possible to acquire and available. VIDIT contains 300 virtual scenes used for training, where every scene is captured 40 times in total: from 8 equally-spaced azimuthal angles, each lit with 5 different illuminants. △ Less

Submitted 13 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: For further information and data, see https://github.com/majedelhelou/VIDIT

arXiv:2005.04469 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053903

Realizability of Planar Point Embeddings from Angle Measurements

Authors: Frederike Dümbgen, Majed El Helou, Adam Scholefield

Abstract: Localization of a set of nodes is an important and a thoroughly researched problem in robotics and sensor networks. This paper is concerned with the theory of localization from inner-angle measurements. We focus on the challenging case where no anchor locations are known. Inspired by Euclidean distance matrices, we investigate when a set of inner angles corresponds to a realizable point set. In pa… ▽ More Localization of a set of nodes is an important and a thoroughly researched problem in robotics and sensor networks. This paper is concerned with the theory of localization from inner-angle measurements. We focus on the challenging case where no anchor locations are known. Inspired by Euclidean distance matrices, we investigate when a set of inner angles corresponds to a realizable point set. In particular, we find linear and non-linear constraints that are provably necessary, and we conjecture also sufficient for characterizing realizable angle sets. We confirm this in extensive numerical simulations, and we illustrate the use of these constraints for denoising angle measurements along with the reconstruction of a valid point set. △ Less

Submitted 9 May, 2020; originally announced May 2020.

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:2004.06409 [pdf, other]

Divergence-Based Adaptive Extreme Video Completion

Authors: Majed El Helou, Ruofan Zhou, Frank Schmutz, Fabrice Guibert, Sabine Süsstrunk

Abstract: Extreme image or video completion, where, for instance, we only retain 1% of pixels in random locations, allows for very cheap sampling in terms of the required pre-processing. The consequence is, however, a reconstruction that is challenging for humans and inpainting algorithms alike. We propose an extension of a state-of-the-art extreme image completion algorithm to extreme video completion. We… ▽ More Extreme image or video completion, where, for instance, we only retain 1% of pixels in random locations, allows for very cheap sampling in terms of the required pre-processing. The consequence is, however, a reconstruction that is challenging for humans and inpainting algorithms alike. We propose an extension of a state-of-the-art extreme image completion algorithm to extreme video completion. We analyze a color-motion estimation approach based on color KL-divergence that is suitable for extremely sparse scenarios. Our algorithm leverages the estimate to adapt between its spatial and temporal filtering when reconstructing the sparse randomly-sampled video. We validate our results on 50 publicly-available videos using reconstruction PSNR and mean opinion scores. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:2003.07119 [pdf, other]

Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks

Authors: Majed El Helou, Ruofan Zhou, Sabine Süsstrunk

Abstract: Super-resolution and denoising are ill-posed yet fundamental image restoration tasks. In blind settings, the degradation kernel or the noise level are unknown. This makes restoration even more challenging, notably for learning-based methods, as they tend to overfit to the degradation seen during training. We present an analysis, in the frequency domain, of degradation-kernel overfitting in super-r… ▽ More Super-resolution and denoising are ill-posed yet fundamental image restoration tasks. In blind settings, the degradation kernel or the noise level are unknown. This makes restoration even more challenging, notably for learning-based methods, as they tend to overfit to the degradation seen during training. We present an analysis, in the frequency domain, of degradation-kernel overfitting in super-resolution and introduce a conditional learning perspective that extends to both super-resolution and denoising. Building on our formulation, we propose a stochastic frequency masking of images used in training to regularize the networks and address the overfitting problem. Our technique improves state-of-the-art methods on blind super-resolution with different synthetic kernels, real super-resolution, blind Gaussian denoising, and real-image denoising. △ Less

Submitted 23 July, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: ECCV 2020. Project page: https://github.com/majedelhelou/SFM

arXiv:2003.05961 [pdf, other]

W2S: Microscopy Data with Joint Denoising and Super-Resolution for Widefield to SIM Map**

Authors: Ruofan Zhou, Majed El Helou, Daniel Sage, Thierry Laroche, Arne Seitz, Sabine Süsstrunk

Abstract: In fluorescence microscopy live-cell imaging, there is a critical trade-off between the signal-to-noise ratio and spatial resolution on one side, and the integrity of the biological sample on the other side. To obtain clean high-resolution (HR) images, one can either use microscopy techniques, such as structured-illumination microscopy (SIM), or apply denoising and super-resolution (SR) algorithms… ▽ More In fluorescence microscopy live-cell imaging, there is a critical trade-off between the signal-to-noise ratio and spatial resolution on one side, and the integrity of the biological sample on the other side. To obtain clean high-resolution (HR) images, one can either use microscopy techniques, such as structured-illumination microscopy (SIM), or apply denoising and super-resolution (SR) algorithms. However, the former option requires multiple shots that can damage the samples, and although efficient deep learning based algorithms exist for the latter option, no benchmark exists to evaluate these algorithms on the joint denoising and SR (JDSR) tasks. To study JDSR on microscopy data, we propose such a novel JDSR dataset, Widefield2SIM (W2S), acquired using a conventional fluorescence widefield and SIM imaging. W2S includes 144,000 real fluorescence microscopy images, resulting in a total of 360 sets of images. A set is comprised of noisy low-resolution (LR) widefield images with different noise levels, a noise-free LR image, and a corresponding high-quality HR SIM image. W2S allows us to benchmark the combinations of 6 denoising methods and 6 SR methods. We show that state-of-the-art SR networks perform very poorly on noisy inputs. Our evaluation also reveals that applying the best denoiser in terms of reconstruction error followed by the best SR method does not necessarily yield the best final result. Both quantitative and qualitative results show that SR networks are sensitive to noise and the sequential application of denoising and SR algorithms is sub-optimal. Lastly, we demonstrate that SR networks retrained end-to-end for JDSR outperform any combination of state-of-the-art deep denoising and SR networks △ Less

Submitted 24 August, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

Comments: ECCVW 2020. Project page: \<https://github.com/ivrl/w2s>

arXiv:2003.03633 [pdf, other]

AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks

Authors: Majed El Helou, Frederike Dümbgen, Sabine Süsstrunk

Abstract: The large capacity of neural networks enables them to learn complex functions. To avoid overfitting, networks however require a lot of training data that can be expensive and time-consuming to collect. A common practical approach to attenuate overfitting is the use of network regularization techniques. We propose a novel regularization method that progressively penalizes the magnitude of activatio… ▽ More The large capacity of neural networks enables them to learn complex functions. To avoid overfitting, networks however require a lot of training data that can be expensive and time-consuming to collect. A common practical approach to attenuate overfitting is the use of network regularization techniques. We propose a novel regularization method that progressively penalizes the magnitude of activations during training. The combined activation signals produced by all neurons in a given layer form the representation of the input image in that feature space. We propose to regularize this representation in the last feature layer before classification layers. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations. Experimental results show the advantages of our approach in comparison with commonly-used regularizers on standard benchmark datasets. △ Less

Submitted 7 March, 2020; originally announced March 2020.

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

arXiv:1907.03029 [pdf]

doi 10.1109/TIP.2020.2976814

Blind Universal Bayesian Image Denoising with Gaussian Noise Level Learning

Authors: Majed El Helou, Sabine Süsstrunk

Abstract: Blind and universal image denoising consists of using a unique model that denoises images with any level of noise. It is especially practical as noise levels do not need to be known when the model is developed or at test time. We propose a theoretically-grounded blind and universal deep learning image denoiser for additive Gaussian noise removal. Our network is based on an optimal denoising soluti… ▽ More Blind and universal image denoising consists of using a unique model that denoises images with any level of noise. It is especially practical as noise levels do not need to be known when the model is developed or at test time. We propose a theoretically-grounded blind and universal deep learning image denoiser for additive Gaussian noise removal. Our network is based on an optimal denoising solution, which we call fusion denoising. It is derived theoretically with a Gaussian image prior assumption. Synthetic experiments show our network's generalization strength to unseen additive noise levels. We also adapt the fusion denoising network architecture for image denoising on real images. Our approach improves real-world grayscale additive image denoising PSNR results for training noise levels and further on noise levels not seen during training. It also improves state-of-the-art color image denoising performance on every single noise level, by an average of 0.1dB, whether trained on or not. △ Less

Submitted 7 March, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

Comments: Final uncompressed TIP version available online in open access (DOI attached)

ACM Class: I.4.4

Journal ref: IEEE Transactions on Image Processing, vol. 29, pp. 4885-4897, 2020

arXiv:1905.08214 [pdf, other]

Drone Shadow Tracking

Authors: Xiaoyan Zou, Ruofan Zhou, Majed El Helou, Sabine Süsstrunk

Abstract: Aerial videos taken by a drone not too far above the surface may contain the drone's shadow projected on the scene. This deteriorates the aesthetic quality of videos. With the presence of other shadows, shadow removal cannot be directly applied, and the shadow of the drone must be tracked. Tracking a drone's shadow in a video is, however, challenging. The varying size, shape, change of orientation… ▽ More Aerial videos taken by a drone not too far above the surface may contain the drone's shadow projected on the scene. This deteriorates the aesthetic quality of videos. With the presence of other shadows, shadow removal cannot be directly applied, and the shadow of the drone must be tracked. Tracking a drone's shadow in a video is, however, challenging. The varying size, shape, change of orientation and drone altitude pose difficulties. The shadow can also easily disappear over dark areas. However, a shadow has specific properties that can be leveraged, besides its geometric shape. In this paper, we incorporate knowledge of the shadow's physical properties, in the form of shadow detection masks, into a correlation-based tracking algorithm. We capture a test set of aerial videos taken with different settings and compare our results to those of a state-of-the-art tracking algorithm. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: 5 pages, 4 figures

arXiv:1809.04187 [pdf, other]

Fourier-Domain Optimization for Image Processing

Authors: Majed El Helou, Frederike Dümbgen, Radhakrishna Achanta, Sabine Süsstrunk

Abstract: Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others. With current image sizes in the order of megabytes, it is extremely expensive to run conventional algorithms such as gradient descent, making them unfavorable especially when closed-form solutions can be derived and… ▽ More Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others. With current image sizes in the order of megabytes, it is extremely expensive to run conventional algorithms such as gradient descent, making them unfavorable especially when closed-form solutions can be derived and computed efficiently. This paper explains in detail the framework for solving convex image optimization and deconvolution in the Fourier domain. We begin by explaining the mathematical background and motivating why the presented setups can be transformed and solved very efficiently in the Fourier domain. We also show how to practically use these solutions, by providing the corresponding implementations. The explanations are aimed at a broad audience with minimal knowledge of convolution and image optimization. The eager reader can jump to Section 3 for a footprint of how to solve and implement a sample optimization function, and Section 5 for the more complex cases. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Showing 1–21 of 21 results for author: Helou, M E