-
LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach
Authors:
Maria Pilligua,
Nil Biescas,
Javier Vazquez-Corral,
Josep Lladós,
Ernest Valveny,
Sanket Biswas
Abstract:
The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (D…
▽ More
The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (DIR) systems. We propose LayeredDoc, which utilizes two layers of information: the first targets coarse-grained graphic components, while the second refines machine-printed textual content. This hierarchical DIR framework dynamically adjusts to the characteristics of the input document, facilitating effective domain adaptation. We evaluated our approach both qualitatively and quantitatively using a new real-world dataset, LayeredDocDB, developed for this study. Initially trained on a synthetically generated dataset, our model demonstrates strong generalization capabilities for the DIR task, offering a promising solution for handling variability in real-world data. Our code is accessible on GitHub.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Generalized Portrait Quality Assessment
Authors:
Nicolas Chahine,
Sira Ferradans,
Javier Vazquez-Corral,
Jean Ponce
Abstract:
Automated and robust portrait quality assessment (PQA) is of paramount importance in high-impact applications such as smartphone photography. This paper presents FHIQA, a learning-based approach to PQA that introduces a simple but effective quality score rescaling method based on image semantics, to enhance the precision of fine-grained image quality metrics while ensuring robust generalization to…
▽ More
Automated and robust portrait quality assessment (PQA) is of paramount importance in high-impact applications such as smartphone photography. This paper presents FHIQA, a learning-based approach to PQA that introduces a simple but effective quality score rescaling method based on image semantics, to enhance the precision of fine-grained image quality metrics while ensuring robust generalization to various scene settings beyond the training dataset. The proposed approach is validated by extensive experiments on the PIQ23 benchmark and comparisons with the current state of the art. The source code of FHIQA will be made publicly available on the PIQ23 GitHub repository at https://github.com/DXOMARK-Research/PIQ2023.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Towards a Perceptual Evaluation Framework for Lighting Estimation
Authors:
Justine Giroux,
Mohammad Reza Karimi Dastjerdi,
Yannick Hold-Geoffroy,
Javier Vazquez-Corral,
Jean-François Lalonde
Abstract:
Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical exp…
▽ More
Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms.
△ Less
Submitted 20 March, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement
Authors:
Marcos V. Conde,
Javier Vazquez-Corral,
Michael S. Brown,
Radu Timofte
Abstract:
3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet n…
▽ More
3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs. Code, models and dataset available at: https://github.com/mv-lab/nilut
△ Less
Submitted 24 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Perceptual Image Enhancement for Smartphone Real-Time Applications
Authors:
Marcos V. Conde,
Florin Vasluianu,
Javier Vazquez-Corral,
Radu Timofte
Abstract:
Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image…
▽ More
Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
△ Less
Submitted 22 November, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Matching visual induction effects on screens of different size
Authors:
Trevor D. Canham,
Javier Vazquez-Corral,
Elise Mathieu,
Marcelo Bertalmío
Abstract:
In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This presents a practical challenge for the preservat…
▽ More
In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This presents a practical challenge for the preservation of the artistic intentions of filmmakers, as it can lead to shifts in image appearance between viewing destinations. In this work we show that a neural field model based on the efficient representation principle is able to predict induction effects, and how by regularizing its associated energy functional the model is still able to represent induction but is now invertible. From this we propose a method to pre-process an image in a screen-size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images.
△ Less
Submitted 26 January, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Visual Illusions Also Deceive Convolutional Neural Networks: Analysis and Implications
Authors:
A. Gomez-Villa,
A. Martín,
J. Vazquez-Corral,
M. Bertalmío,
J. Malo
Abstract:
Visual illusions allow researchers to devise and test new models of visual perception. Here we show that artificial neural networks trained for basic visual tasks in natural images are deceived by brightness and color illusions, having a response that is qualitatively very similar to the human achromatic and chromatic contrast sensitivity functions, and consistent with natural image statistics. We…
▽ More
Visual illusions allow researchers to devise and test new models of visual perception. Here we show that artificial neural networks trained for basic visual tasks in natural images are deceived by brightness and color illusions, having a response that is qualitatively very similar to the human achromatic and chromatic contrast sensitivity functions, and consistent with natural image statistics. We also show that, while these artificial networks are deceived by illusions, their response might be significantly different to that of humans. Our results suggest that low-level illusions appear in any system that has to perform basic visual tasks in natural environments, in line with error minimization explanations of visual function, and they also imply a word of caution on using artificial networks to study human vision, as previously suggested in other contexts in the vision science literature.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Synthesizing Visual Illusions Using Generative Adversarial Networks
Authors:
Alexander Gomez-Villa,
Adrian Martín,
Javier Vazquez-Corral,
Jesús Malo,
Marcelo Bertalmío
Abstract:
Visual illusions are a very useful tool for vision scientists, because they allow them to better probe the limits, thresholds and errors of the visual system. In this work we introduce the first ever framework to generate novel visual illusions with an artificial neural network (ANN). It takes the form of a generative adversarial network, with a generator of visual illusion candidates and two disc…
▽ More
Visual illusions are a very useful tool for vision scientists, because they allow them to better probe the limits, thresholds and errors of the visual system. In this work we introduce the first ever framework to generate novel visual illusions with an artificial neural network (ANN). It takes the form of a generative adversarial network, with a generator of visual illusion candidates and two discriminator modules, one for the inducer background and another that decides whether or not the candidate is indeed an illusion. The generality of the model is exemplified by synthesizing illusions of different types, and validated with psychophysical experiments that corroborate that the outputs of our ANN are indeed visual illusions to human observers. Apart from synthesizing new visual illusions, which may help vision researchers, the proposed model has the potential to open new ways to study the similarities and differences between ANN and human visual perception.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Convolutional Neural Networks Deceived by Visual Illusions
Authors:
Alexander Gomez-Villa,
Adrián Martín,
Javier Vazquez-Corral,
Marcelo Bertalmío
Abstract:
Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the op…
▽ More
Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the operations present in Convolutional Neural Networks (CNNs). This motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. We believe that this CNNs behaviour appears as a by-product of the training for the low level vision tasks of denoising, color constancy or deblurring. Our work opens a new bridge between human perception and CNNs: in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
On the Duality Between Retinex and Image Dehazing
Authors:
Adrian Galdran,
Aitor Alvarez-Gila,
Alessandro Bria,
Javier Vazquez-Corral,
Marcelo Bertalmio
Abstract:
Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement an…
▽ More
Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement and other related tasks. While these two problems are apparently unrelated, the goal of this work is to show that they can be connected by a simple linear relationship. Specifically, most Retinex-based algorithms have the characteristic feature of always increasing image brightness, which turns them into ideal candidates for effective image dehazing by directly applying Retinex to a hazy image whose intensities have been inverted. In this paper, we give theoretical proof that Retinex on inverted intensities is a solution to the image dehazing problem. Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem.
△ Less
Submitted 6 April, 2018; v1 submitted 7 December, 2017;
originally announced December 2017.