Search | arXiv e-print repository

LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach

Authors: Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas

Abstract: The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (D… ▽ More The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (DIR) systems. We propose LayeredDoc, which utilizes two layers of information: the first targets coarse-grained graphic components, while the second refines machine-printed textual content. This hierarchical DIR framework dynamically adjusts to the characteristics of the input document, facilitating effective domain adaptation. We evaluated our approach both qualitatively and quantitatively using a new real-world dataset, LayeredDocDB, developed for this study. Initially trained on a synthetically generated dataset, our model demonstrates strong generalization capabilities for the DIR task, offering a promising solution for handling variability in real-world data. Our code is accessible on GitHub. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted to ICDAR 2024 (Athens, Greece) Workshop on Automatically Domain-Adapted and Personalized Document Analysis (ADAPDA)

arXiv:2402.09178 [pdf, other]

doi 10.13039/501100011033

Generalized Portrait Quality Assessment

Authors: Nicolas Chahine, Sira Ferradans, Javier Vazquez-Corral, Jean Ponce

Abstract: Automated and robust portrait quality assessment (PQA) is of paramount importance in high-impact applications such as smartphone photography. This paper presents FHIQA, a learning-based approach to PQA that introduces a simple but effective quality score rescaling method based on image semantics, to enhance the precision of fine-grained image quality metrics while ensuring robust generalization to… ▽ More Automated and robust portrait quality assessment (PQA) is of paramount importance in high-impact applications such as smartphone photography. This paper presents FHIQA, a learning-based approach to PQA that introduces a simple but effective quality score rescaling method based on image semantics, to enhance the precision of fine-grained image quality metrics while ensuring robust generalization to various scene settings beyond the training dataset. The proposed approach is validated by extensive experiments on the PIQ23 benchmark and comparisons with the current state of the art. The source code of FHIQA will be made publicly available on the PIQ23 GitHub repository at https://github.com/DXOMARK-Research/PIQ2023. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Pre-print

arXiv:2312.04334 [pdf, other]

Towards a Perceptual Evaluation Framework for Lighting Estimation

Authors: Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-François Lalonde

Abstract: Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical exp… ▽ More Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms. △ Less

Submitted 20 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2306.11920 [pdf, other]

NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement

Authors: Marcos V. Conde, Javier Vazquez-Corral, Michael S. Brown, Radu Timofte

Abstract: 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet n… ▽ More 3D lookup tables (3D LUTs) are a key component for image enhancement. Modern image signal processors (ISPs) have dedicated support for these as part of the camera rendering pipeline. Cameras typically provide multiple options for picture styles, where each style is usually obtained by applying a unique handcrafted 3D LUT. Current approaches for learning and applying 3D LUTs are notably fast, yet not so memory-efficient, as storing multiple 3D LUTs is required. For this reason and other implementation limitations, their use on mobile devices is less popular. In this work, we propose a Neural Implicit LUT (NILUT), an implicitly defined continuous 3D color transformation parameterized by a neural network. We show that NILUTs are capable of accurately emulating real 3D LUTs. Moreover, a NILUT can be extended to incorporate multiple styles into a single network with the ability to blend styles implicitly. Our novel approach is memory-efficient, controllable and can complement previous methods, including learned ISPs. Code, models and dataset available at: https://github.com/mv-lab/nilut △ Less

Submitted 24 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: AAAI 2024 - The 38th Annual AAAI Conference on Artificial Intelligence

arXiv:2210.13552 [pdf, other]

Perceptual Image Enhancement for Smartphone Real-Time Applications

Authors: Marcos V. Conde, Florin Vasluianu, Javier Vazquez-Corral, Radu Timofte

Abstract: Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image… ▽ More Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, due to the small size and lens limitations of the smartphone cameras, we commonly find artifacts or degradation in the processed images. The most common unpleasant effects are noise artifacts, diffraction artifacts, blur, and HDR overexposure. Deep learning methods for image restoration can successfully remove these artifacts. However, most approaches are not suitable for real-time applications on mobile devices due to their heavy computation and memory requirements. In this paper, we propose LPIENet, a lightweight network for perceptual image enhancement, with the focus on deploying it on smartphones. Our experiments show that, with much fewer parameters and operations, our model can deal with the mentioned artifacts and achieve competitive performance compared with state-of-the-art methods on standard benchmarks. Moreover, to prove the efficiency and reliability of our approach, we deployed the model directly on commercial smartphones and evaluated its performance. Our model can process 2K resolution images under 1 second in mid-level commercial smartphones. △ Less

Submitted 22 November, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: IEEE/CVF WACV 2023 (Oral)

arXiv:2005.02694 [pdf, other]

Matching visual induction effects on screens of different size

Authors: Trevor D. Canham, Javier Vazquez-Corral, Elise Mathieu, Marcelo Bertalmío

Abstract: In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This presents a practical challenge for the preservat… ▽ More In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This presents a practical challenge for the preservation of the artistic intentions of filmmakers, as it can lead to shifts in image appearance between viewing destinations. In this work we show that a neural field model based on the efficient representation principle is able to predict induction effects, and how by regularizing its associated energy functional the model is still able to represent induction but is now invertible. From this we propose a method to pre-process an image in a screen-size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images. △ Less

Submitted 26 January, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: pre-print

arXiv:1912.01643 [pdf, other]

Visual Illusions Also Deceive Convolutional Neural Networks: Analysis and Implications

Authors: A. Gomez-Villa, A. Martín, J. Vazquez-Corral, M. Bertalmío, J. Malo

Abstract: Visual illusions allow researchers to devise and test new models of visual perception. Here we show that artificial neural networks trained for basic visual tasks in natural images are deceived by brightness and color illusions, having a response that is qualitatively very similar to the human achromatic and chromatic contrast sensitivity functions, and consistent with natural image statistics. We… ▽ More Visual illusions allow researchers to devise and test new models of visual perception. Here we show that artificial neural networks trained for basic visual tasks in natural images are deceived by brightness and color illusions, having a response that is qualitatively very similar to the human achromatic and chromatic contrast sensitivity functions, and consistent with natural image statistics. We also show that, while these artificial networks are deceived by illusions, their response might be significantly different to that of humans. Our results suggest that low-level illusions appear in any system that has to perform basic visual tasks in natural environments, in line with error minimization explanations of visual function, and they also imply a word of caution on using artificial networks to study human vision, as previously suggested in other contexts in the vision science literature. △ Less

Submitted 3 December, 2019; originally announced December 2019.

arXiv:1911.09599 [pdf, other]

Synthesizing Visual Illusions Using Generative Adversarial Networks

Authors: Alexander Gomez-Villa, Adrian Martín, Javier Vazquez-Corral, Jesús Malo, Marcelo Bertalmío

Abstract: Visual illusions are a very useful tool for vision scientists, because they allow them to better probe the limits, thresholds and errors of the visual system. In this work we introduce the first ever framework to generate novel visual illusions with an artificial neural network (ANN). It takes the form of a generative adversarial network, with a generator of visual illusion candidates and two disc… ▽ More Visual illusions are a very useful tool for vision scientists, because they allow them to better probe the limits, thresholds and errors of the visual system. In this work we introduce the first ever framework to generate novel visual illusions with an artificial neural network (ANN). It takes the form of a generative adversarial network, with a generator of visual illusion candidates and two discriminator modules, one for the inducer background and another that decides whether or not the candidate is indeed an illusion. The generality of the model is exemplified by synthesizing illusions of different types, and validated with psychophysical experiments that corroborate that the outputs of our ANN are indeed visual illusions to human observers. Apart from synthesizing new visual illusions, which may help vision researchers, the proposed model has the potential to open new ways to study the similarities and differences between ANN and human visual perception. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1811.10565 [pdf, other]

Convolutional Neural Networks Deceived by Visual Illusions

Authors: Alexander Gomez-Villa, Adrián Martín, Javier Vazquez-Corral, Marcelo Bertalmío

Abstract: Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the op… ▽ More Visual illusions teach us that what we see is not always what it is represented in the physical world. Its special nature make them a fascinating tool to test and validate any new vision model proposed. In general, current vision models are based on the concatenation of linear convolutions and non-linear operations. In this paper we get inspiration from the similarity of this structure with the operations present in Convolutional Neural Networks (CNNs). This motivated us to study if CNNs trained for low-level visual tasks are deceived by visual illusions. In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. We believe that this CNNs behaviour appears as a by-product of the training for the low level vision tasks of denoising, color constancy or deblurring. Our work opens a new bridge between human perception and CNNs: in order to obtain CNNs that better replicate human behaviour, we may need to start aiming for them to better replicate visual illusions. △ Less

Submitted 26 November, 2018; originally announced November 2018.

arXiv:1712.02754 [pdf, other]

On the Duality Between Retinex and Image Dehazing

Authors: Adrian Galdran, Aitor Alvarez-Gila, Alessandro Bria, Javier Vazquez-Corral, Marcelo Bertalmio

Abstract: Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement an… ▽ More Image dehazing deals with the removal of undesired loss of visibility in outdoor images due to the presence of fog. Retinex is a color vision model mimicking the ability of the Human Visual System to robustly discount varying illuminations when observing a scene under different spectral lighting conditions. Retinex has been widely explored in the computer vision literature for image enhancement and other related tasks. While these two problems are apparently unrelated, the goal of this work is to show that they can be connected by a simple linear relationship. Specifically, most Retinex-based algorithms have the characteristic feature of always increasing image brightness, which turns them into ideal candidates for effective image dehazing by directly applying Retinex to a hazy image whose intensities have been inverted. In this paper, we give theoretical proof that Retinex on inverted intensities is a solution to the image dehazing problem. Comprehensive qualitative and quantitative results indicate that several classical and modern implementations of Retinex can be transformed into competing image dehazing algorithms performing on pair with more complex fog removal methods, and can overcome some of the main challenges associated with this problem. △ Less

Submitted 6 April, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: 13 pages, 5 figures

Showing 1–10 of 10 results for author: Vazquez-Corral, J