Search | arXiv e-print repository

Diffusion-based image inpainting with internal learning

Authors: Nicolas Cherel, Andrés Almansa, Yann Gousseau, Alasdair Newson

Abstract: Diffusion models are now the undisputed state-of-the-art for image generation and image restoration. However, they require large amounts of computational power for training and inference. In this paper, we propose lightweight diffusion models for image inpainting that can be trained on a single image, or a few images. We show that our approach competes with large state-of-the-art models in specifi… ▽ More Diffusion models are now the undisputed state-of-the-art for image generation and image restoration. However, they require large amounts of computational power for training and inference. In this paper, we propose lightweight diffusion models for image inpainting that can be trained on a single image, or a few images. We show that our approach competes with large state-of-the-art models in specific cases. We also show that training a model on a single image is particularly relevant for image acquisition modality that differ from the RGB images of standard learning databases. We show results in three different contexts: texture images, line drawing images, and materials BRDF, for which we achieve state-of-the-art results in terms of realism, with a computational load that is greatly reduced compared to concurrent methods. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures. EUSIPCO 2024

arXiv:2404.07212 [pdf, other]

doi 10.1007/978-3-031-31975-4_24

Hybrid Training of Denoising Networks to Improve the Texture Acutance of Digital Cameras

Authors: Raphaël Achddou, Yann Gousseau, Saïd Ladjal

Abstract: In order to evaluate the capacity of a camera to render textures properly, the standard practice, used by classical scoring protocols, is to compute the frequential response to a dead leaves image target, from which is built a texture acutance metric. In this work, we propose a mixed training procedure for image restoration neural networks, relying on both natural and synthetic images, that yields… ▽ More In order to evaluate the capacity of a camera to render textures properly, the standard practice, used by classical scoring protocols, is to compute the frequential response to a dead leaves image target, from which is built a texture acutance metric. In this work, we propose a mixed training procedure for image restoration neural networks, relying on both natural and synthetic images, that yields a strong improvement of this acutance metric without impairing fidelity terms. The feasibility of the approach is demonstrated both on the denoising of RGB images and the full development of RAW images, opening the path to a systematic improvement of the texture acutance of real imaging devices. △ Less

Submitted 20 February, 2024; originally announced April 2024.

Journal ref: Scale Space and Variational Methods in Computer Vision, May 2023, Santa Margherita di Pula, Italy. pp.314-325

arXiv:2312.08256 [pdf, other]

A Compact and Semantic Latent Space for Disentangled and Controllable Image Editing

Authors: Gwilherm Lesné, Yann Gousseau, Saïd Ladjal, Alasdair Newson

Abstract: Recent advances in the field of generative models and in particular generative adversarial networks (GANs) have lead to substantial progress for controlled image editing, especially compared with the pre-deep learning era. Despite their powerful ability to apply realistic modifications to an image, these methods often lack properties like disentanglement (the capacity to edit attributes independen… ▽ More Recent advances in the field of generative models and in particular generative adversarial networks (GANs) have lead to substantial progress for controlled image editing, especially compared with the pre-deep learning era. Despite their powerful ability to apply realistic modifications to an image, these methods often lack properties like disentanglement (the capacity to edit attributes independently). In this paper, we propose an auto-encoder which re-organizes the latent space of StyleGAN, so that each attribute which we wish to edit corresponds to an axis of the new latent space, and furthermore that the latent axes are decorrelated, encouraging disentanglement. We work in a compressed version of the latent space, using Principal Component Analysis, meaning that the parameter complexity of our autoencoder is reduced, leading to short training times ($\sim$ 45 mins). Qualitative and quantitative results demonstrate the editing capabilities of our approach, with greater disentanglement than competing methods, while maintaining fidelity to the original image with respect to identity. Our autoencoder architecture simple and straightforward, facilitating implementation. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.01090 [pdf, other]

Infusion: Internal Diffusion for Video Inpainting

Authors: Nicolas Cherel, Andrés Almansa, Yann Gousseau, Alasdair Newson

Abstract: Video inpainting is the task of filling a desired region in a video in a visually convincing manner. It is a very challenging task due to the high dimensionality of the signal and the temporal consistency required for obtaining convincing results. Recently, diffusion models have shown impressive results in modeling complex data distributions, including images and videos. Diffusion models remain no… ▽ More Video inpainting is the task of filling a desired region in a video in a visually convincing manner. It is a very challenging task due to the high dimensionality of the signal and the temporal consistency required for obtaining convincing results. Recently, diffusion models have shown impressive results in modeling complex data distributions, including images and videos. Diffusion models remain nonetheless very expensive to train and perform inference with, which strongly restrict their application to video. We show that in the case of video inpainting, thanks to the highly auto-similar nature of videos, the training of a diffusion model can be restricted to the video to inpaint and still produce very satisfying results. This leads us to adopt an internal learning approch, which also allows for a greatly reduced network size. We call our approach "Infusion": an internal learning algorithm for video inpainting through diffusion. Due to our frugal network, we are able to propose the first video inpainting approach based purely on diffusion. Other methods require supporting elements such as optical flow estimation, which limits their performance in the case of dynamic textures for example. We introduce a new method for efficient training and inference of diffusion models in the context of internal learning. We split the diffusion process into different learning intervals which greatly simplifies the learning steps. We show qualititative and quantitative results, demonstrating that our method reaches state-of-the-art performance, in particular in the case of dynamic backgrounds and textures. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 12 pages, 8 figures

arXiv:2302.01648 [pdf, other]

doi 10.1109/ICPR56361.2022.9956498

A statistically constrained internal method for single image super-resolution

Authors: Pierrick Chatillon, Yann Gousseau, Sidonie Lefebvre

Abstract: Deep learning based methods for single-image super-resolution (SR) have drawn a lot of attention lately. In particular, various papers have shown that the learning stage can be performed on a single image, resulting in the so-called internal approaches. The SinGAN method is one of these contributions, where the distribution of image patches is learnt on the image at hand and propagated at finer sc… ▽ More Deep learning based methods for single-image super-resolution (SR) have drawn a lot of attention lately. In particular, various papers have shown that the learning stage can be performed on a single image, resulting in the so-called internal approaches. The SinGAN method is one of these contributions, where the distribution of image patches is learnt on the image at hand and propagated at finer scales. Now, there are situations where some statistical a priori can be assumed for the final image. In particular, many natural phenomena yield images having power law Fourier spectrum, such as clouds and other texture images. In this work, we show how such a priori information can be integrated into an internal super-resolution approach, by constraining the learned up-sampling procedure of SinGAN. We consider various types of constraints, related to the Fourier power spectrum, the color histograms and the consistency of the upsampling scheme. We demonstrate on various experiments that these constraints are indeed satisfied, but also that some perceptual quality measures can be improved by the proposed approach. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Journal ref: 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 1322-1328

arXiv:2302.01616 [pdf, other]

doi 10.1007/978-3-031-31975-4_20

A geometrically aware auto-encoder for multi-texture synthesis

Authors: Pierrick Chatillon, Yann Gousseau, Sidonie Lefebvre

Abstract: We propose an auto-encoder architecture for multi-texture synthesis. The approach relies on both a compact encoder accounting for second order neural statistics and a generator incorporating adaptive periodic content. Images are embedded in a compact and geometrically consistent latent space, where the texture representation and its spatial organisation are disentangled. Texture synthesis and inte… ▽ More We propose an auto-encoder architecture for multi-texture synthesis. The approach relies on both a compact encoder accounting for second order neural statistics and a generator incorporating adaptive periodic content. Images are embedded in a compact and geometrically consistent latent space, where the texture representation and its spatial organisation are disentangled. Texture synthesis and interpolation tasks can be performed directly from these latent codes. Our experiments demonstrate that our model outperforms state-of-the-art feed-forward methods in terms of visual quality and various texture related metrics. △ Less

Submitted 29 June, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: Error in table 1 corrected

arXiv:2202.03163 [pdf, other]

doi 10.1016/j.cviu.2023.103866

Patch-Based Stochastic Attention for Image Editing

Authors: Nicolas Cherel, Andrés Almansa, Yann Gousseau, Alasdair Newson

Abstract: Attention mechanisms have become of crucial importance in deep learning in recent years. These non-local operations, which are similar to traditional patch-based methods in image processing, complement local convolutions. However, computing the full attention matrix is an expensive step with heavy memory and computational loads. These limitations curb network architectures and performances, in par… ▽ More Attention mechanisms have become of crucial importance in deep learning in recent years. These non-local operations, which are similar to traditional patch-based methods in image processing, complement local convolutions. However, computing the full attention matrix is an expensive step with heavy memory and computational loads. These limitations curb network architectures and performances, in particular for the case of high resolution images. We propose an efficient attention layer based on the stochastic algorithm PatchMatch, which is used for determining approximate nearest neighbors. We refer to our proposed layer as a "Patch-based Stochastic Attention Layer" (PSAL). Furthermore, we propose different approaches, based on patch aggregation, to ensure the differentiability of PSAL, thus allowing end-to-end training of any network containing our layer. PSAL has a small memory footprint and can therefore scale to high resolution images. It maintains this footprint without sacrificing spatial precision and globality of the nearest neighbors, which means that it can be easily inserted in any level of a deep architecture, even in shallower levels. We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution. Our code is available at: https://github.com/ncherel/psal △ Less

Submitted 1 November, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Comments: 17 pages, 12 figures. Accepted version for publication in Computer Vision and Image Understanding (CVIU)

Journal ref: Computer Vision and Image Understanding, Volume 238, 2024, 103866,

arXiv:2202.02183 [pdf, other]

Feature-Style Encoder for Style-Based GAN Inversion

Authors: Xu Yao, Alasdair Newson, Yann Gousseau, Pierre Hellier

Abstract: We propose a novel architecture for GAN inversion, which we call Feature-Style encoder. The style encoder is key for the manipulation of the obtained latent codes, while the feature encoder is crucial for optimal image reconstruction. Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model, obtaining better perceptual quality and lower reco… ▽ More We propose a novel architecture for GAN inversion, which we call Feature-Style encoder. The style encoder is key for the manipulation of the obtained latent codes, while the feature encoder is crucial for optimal image reconstruction. Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model, obtaining better perceptual quality and lower reconstruction error than existing methods. Thanks to its encoder structure, the model allows fast and accurate image editing. Additionally, we demonstrate that the proposed encoder is especially well-suited for inversion and editing on videos. We conduct extensive experiments for several style-based generators pre-trained on different data domains. Our proposed method yields state-of-the-art results for style-based GAN inversion, significantly outperforming competing approaches. Source codes are available at https://github.com/InterDigitalInc/FeatureStyleEncoder . △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2112.15367 [pdf, other]

doi 10.1007/s10994-021-06008-4

Weakly Supervised Change Detection Using Guided Anisotropic Difusion

Authors: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, Yann Gousseau

Abstract: Large scale datasets created from crowdsourced labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose original ideas that help us to leverage such datasets… ▽ More Large scale datasets created from crowdsourced labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose original ideas that help us to leverage such datasets in the context of change detection. First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results using the input images as guides to perform edge preserving filtering. We then show its potential in two weakly-supervised learning strategies tailored for change detection. The first strategy is an iterative learning method that combines model optimisation and data cleansing using GAD to extract the useful information from a large scale change detection dataset generated from open vector data. The second one incorporates GAD within a novel spatial attention layer that increases the accuracy of weakly supervised networks trained to perform pixel-level predictions from image-level labels. Improvements with respect to state-of-the-art are demonstrated on 4 different public datasets. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Comments: Machine Learning Journal 2021. arXiv admin note: substantial text overlap with arXiv:1904.08208

arXiv:2106.11895 [pdf, other]

A Latent Transformer for Disentangled Face Editing in Images and Videos

Authors: Xu Yao, Alasdair Newson, Yann Gousseau, Pierre Hellier

Abstract: High quality facial image editing is a challenging problem in the movie post-production industry, requiring a high degree of control and identity preservation. Previous works that attempt to tackle this problem may suffer from the entanglement of facial attributes and the loss of the person's identity. Furthermore, many algorithms are limited to a certain task. To tackle these limitations, we prop… ▽ More High quality facial image editing is a challenging problem in the movie post-production industry, requiring a high degree of control and identity preservation. Previous works that attempt to tackle this problem may suffer from the entanglement of facial attributes and the loss of the person's identity. Furthermore, many algorithms are limited to a certain task. To tackle these limitations, we propose to edit facial attributes via the latent space of a StyleGAN generator, by training a dedicated latent transformation network and incorporating explicit disentanglement and identity preservation terms in the loss function. We further introduce a pipeline to generalize our face editing to videos. Our model achieves a disentangled, controllable, and identity-preserving facial attribute editing, even in the challenging case of real (i.e., non-synthetic) images and videos. We conduct extensive experiments on image and video datasets and show that our model outperforms other state-of-the-art methods in visual quality and quantitative evaluation. Source codes are available at https://github.com/InterDigitalInc/latent-transformer. △ Less

Submitted 17 August, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: Accepted by ICCV 2021. Source codes are available at https://github.com/InterDigitalInc/latent-transformer

arXiv:2011.02727 [pdf, other]

An analysis of the transfer learning of convolutional neural networks for artistic images

Authors: Nicolas Gonthier, Yann Gousseau, Saïd Ladjal

Abstract: Transfer learning from huge natural image datasets, fine-tuning of deep neural networks and the use of the corresponding pre-trained networks have become de facto the core of art analysis applications. Nevertheless, the effects of transfer learning are still poorly understood. In this paper, we first use techniques for visualizing the network internal representations in order to provide clues to t… ▽ More Transfer learning from huge natural image datasets, fine-tuning of deep neural networks and the use of the corresponding pre-trained networks have become de facto the core of art analysis applications. Nevertheless, the effects of transfer learning are still poorly understood. In this paper, we first use techniques for visualizing the network internal representations in order to provide clues to the understanding of what the network has learned on artistic images. Then, we provide a quantitative analysis of the changes introduced by the learning process thanks to metrics in both the feature and parameter spaces, as well as metrics computed on the set of maximal activation images. These analyses are performed on several variations of the transfer learning procedure. In particular, we observed that the network could specialize some pre-trained filters to the new image modality and also that higher layers tend to concentrate classes. Finally, we have shown that a double fine-tuning involving a medium-size artistic dataset can improve the classification on smaller datasets, even when the task changes. △ Less

Submitted 24 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: Accepted at Workshop on Fine Art Pattern Extraction and Recognition (FAPER), ICPR, 2020

arXiv:2008.01808 [pdf, other]

High resolution neural texture synthesis with long range constraints

Authors: Nicolas Gonthier, Yann Gousseau, Saïd Ladjal

Abstract: The field of texture synthesis has witnessed important progresses over the last years, most notably through the use of Convolutional Neural Networks. However, neural synthesis methods still struggle to reproduce large scale structures, especially with high resolution textures. To address this issue, we first introduce a simple multi-resolution framework that efficiently accounts for long-range dep… ▽ More The field of texture synthesis has witnessed important progresses over the last years, most notably through the use of Convolutional Neural Networks. However, neural synthesis methods still struggle to reproduce large scale structures, especially with high resolution textures. To address this issue, we first introduce a simple multi-resolution framework that efficiently accounts for long-range dependency. Then, we show that additional statistical constraints further improve the reproduction of textures with strong regularity. This can be achieved by constraining both the Gram matrices of a neural network and the power spectrum of the image. Alternatively one may constrain only the autocorrelation of the features of the network and drop the Gram matrices constraints. In an experimental part, the proposed methods are then extensively tested and compared to alternative approaches, both in an unsupervised way and through a user study. Experiments show the interest of the multi-scale scheme for high resolution textures and the interest of combining it with additional constraints for regular textures. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: 25 pages, 18 figures. LOW RESOLUTION PDF: Images may show compression artifacts

arXiv:2008.01178 [pdf, other]

doi 10.1016/j.cviu.2021.103299

Multiple instance learning on deep features for weakly supervised object detection with extreme domain shifts

Authors: Nicolas Gonthier, Saïd Ladjal, Yann Gousseau

Abstract: Weakly supervised object detection (WSOD) using only image-level annotations has attracted a growing attention over the past few years. Whereas such task is typically addressed with a domain-specific solution focused on natural images, we show that a simple multiple instance approach applied on pre-trained deep features yields excellent performances on non-photographic datasets, possibly including… ▽ More Weakly supervised object detection (WSOD) using only image-level annotations has attracted a growing attention over the past few years. Whereas such task is typically addressed with a domain-specific solution focused on natural images, we show that a simple multiple instance approach applied on pre-trained deep features yields excellent performances on non-photographic datasets, possibly including new classes. The approach does not include any fine-tuning or cross-domain learning and is therefore efficient and possibly applicable to arbitrary datasets and classes. We investigate several flavors of the proposed approach, some including multi-layers perceptron and polyhedral classifiers. Despite its simplicity, our method shows competitive results on a range of publicly available datasets, including paintings (People-Art, IconArt), watercolors, cliparts and comics and allows to quickly learn unseen visual categories. △ Less

Submitted 12 November, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: 26 pages, 12 figures

Report number: 103299 MSC Class: 68T10

Journal ref: Computer Vision and Image Understanding 2022

arXiv:2005.04410 [pdf, other]

High Resolution Face Age Editing

Authors: Xu Yao, Gilles Puy, Alasdair Newson, Yann Gousseau, Pierre Hellier

Abstract: Face age editing has become a crucial task in film post-production, and is also becoming popular for general purpose photography. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resoluti… ▽ More Face age editing has become a crucial task in film post-production, and is also becoming popular for general purpose photography. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resolution images. In order to achieve aging/de-aging with the high quality and robustness necessary for wider use, these problems need to be addressed. This is the goal of the present work. We present an encoder-decoder architecture for face age editing. The core idea of our network is to create both a latent space containing the face identity, and a feature modulation layer corresponding to the age of the individual. We then combine these two elements to produce an output image of the person with a desired target age. Our architecture is greatly simplified with respect to other approaches, and allows for continuous age editing on high resolution images in a single unified model. △ Less

Submitted 9 May, 2020; originally announced May 2020.

arXiv:1904.08208 [pdf, other]

Guided Anisotropic Diffusion and Iterative Learning for Weakly Supervised Change Detection

Authors: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, Yann Gousseau

Abstract: Large scale datasets created from user labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose an iterative learning method that extracts the useful informa… ▽ More Large scale datasets created from user labels or openly available data have become crucial to provide training data for large scale learning algorithms. While these datasets are easier to acquire, the data are frequently noisy and unreliable, which is motivating research on weakly supervised learning techniques. In this paper we propose an iterative learning method that extracts the useful information from a large scale change detection dataset generated from open vector data to train a fully convolutional network which surpasses the performance obtained by naive supervised learning. We also propose the guided anisotropic diffusion algorithm, which improves semantic segmentation results using the input images as guides to perform edge preserving filtering, and is used in conjunction with the iterative training method to improve results. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: Accepted at CVPR 2019 Workshops

arXiv:1904.07099 [pdf, other]

Processsing Simple Geometric Attributes with Autoencoders

Authors: Alasdair Newson, Andrés Almansa, Yann Gousseau, Saïd Ladjal

Abstract: Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and Generative Adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks wo… ▽ More Image synthesis is a core problem in modern deep learning, and many recent architectures such as autoencoders and Generative Adversarial networks produce spectacular results on highly complex data, such as images of faces or landscapes. While these results open up a wide range of new, advanced synthesis applications, there is also a severe lack of theoretical understanding of how these networks work. This results in a wide range of practical problems, such as difficulties in training, the tendency to sample images with little or no variability, and generalisation problems. In this paper, we propose to analyse the ability of the simplest generative network, the autoencoder, to encode and decode two simple geometric attributes : size and position. We believe that, in order to understand more complicated tasks, it is necessary to first understand how these networks process simple attributes. For the first property, we analyse the case of images of centred disks with variable radii. We explain how the autoencoder projects these images to and from a latent space of smallest possible dimension, a scalar. In particular, we describe a closed-form solution to the decoding training problem in a network without biases, and show that during training, the network indeed finds this solution. We then investigate the best regularisation approaches which yield networks that generalise well. For the second property, position, we look at the encoding and decoding of Dirac delta functions, also known as `one-hot' vectors. We describe a hand-crafted filter that achieves encoding perfectly, and show that the network naturally finds this filter during training. We also show experimentally that the decoding can be achieved if the dataset is sampled in an appropriate manner. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1810.08468 [pdf, other]

Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks

Authors: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, Yann Gousseau

Abstract: The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The… ▽ More The Copernicus Sentinel-2 program now provides multispectral images at a global scale with a high revisit rate. In this paper we explore the usage of convolutional neural networks for urban change detection using such multispectral images. We first present the new change detection dataset that was used for training the proposed networks, which will be openly available to serve as a benchmark. The Onera Satellite Change Detection (OSCD) dataset is composed of pairs of multispectral aerial images, and the changes were manually annotated at pixel level. We then propose two architectures to detect changes, Siamese and Early Fusion, and compare the impact of using different numbers of spectral channels as inputs. These architectures are trained from scratch using the provided dataset. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: To appear inProc. IGARSS 2018, July 22-27, 2018, Valencia, Spain

arXiv:1810.08452 [pdf, other]

Multitask Learning for Large-scale Semantic Change Detection

Authors: Rodrigo Caye Daudt, Bertrand Le Saux, Alexandre Boulch, Yann Gousseau

Abstract: Change detection is one of the main problems in remote sensing, and is essential to the accurate processing and understanding of the large scale Earth observation data available through programs such as Sentinel and Landsat. Most of the recently proposed change detection methods bring deep learning to this context, but openly available change detection datasets are still very scarce, which limits… ▽ More Change detection is one of the main problems in remote sensing, and is essential to the accurate processing and understanding of the large scale Earth observation data available through programs such as Sentinel and Landsat. Most of the recently proposed change detection methods bring deep learning to this context, but openly available change detection datasets are still very scarce, which limits the methods that can be proposed and tested. In this paper we present the first large scale high resolution semantic change detection (HRSCD) dataset, which enables the usage of deep learning methods for semantic change detection. The dataset contains coregistered RGB image pairs, pixel-wise change information and land cover information. We then propose several methods using fully convolutional neural networks to perform semantic change detection. Most notably, we present a network architecture that performs change detection and land cover map** simultaneously, while using the predicted land cover information to help to predict changes. We also describe a sequential training scheme that allows this network to be trained without setting a hyperparameter that balances different loss functions and achieves the best overall results. △ Less

Submitted 28 August, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

Comments: Preprint submitted to Computer Vision and Image Understanding

arXiv:1810.02569 [pdf, other]

doi 10.1007/978-3-030-11012-3_53

Weakly Supervised Object Detection in Artworks

Authors: Nicolas Gonthier, Yann Gousseau, Said Ladjal, Olivier Bonfait

Abstract: We propose a method for the weakly supervised detection of objects in paintings. At training time, only image-level annotations are needed. This, combined with the efficiency of our multiple-instance learning method, enables one to learn new classes on-the-fly from globally annotated databases, avoiding the tedious task of manually marking objects. We show on several databases that drop** the in… ▽ More We propose a method for the weakly supervised detection of objects in paintings. At training time, only image-level annotations are needed. This, combined with the efficiency of our multiple-instance learning method, enables one to learn new classes on-the-fly from globally annotated databases, avoiding the tedious task of manually marking objects. We show on several databases that drop** the instance-level annotations only yields mild performance losses. We also introduce a new database, IconArt, on which we perform detection experiments on classes that could not be learned on photographs, such as Jesus Child or Saint Sebastian. To the best of our knowledge, these are the first experiments dealing with the automatic (and in our case weakly supervised) detection of iconographic elements in paintings. We believe that such a method is of great benefit for hel** art historians to explore large digital databases. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Comments: Accepted at ECCV 2018 Workshop Computer Vision for Art Analysis - VISART 2018 14 pages, 5 figures

arXiv:1706.03261 [pdf, other]

doi 10.1109/TCI.2017.2704439

A Bayesian Hyperprior Approach for Joint Image Denoising and Interpolation, with an Application to HDR Imaging

Authors: Cecilia Aguerrebere, Andrés Almansa, Julie Delon, Yann Gousseau, Pablo Musé

Abstract: Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch models. Unfortunately such an approach is particularly unstable for most inverse problems beyond denoising. In this work, we propose the use of a hyperprior to model image patches, in order to… ▽ More Recently, impressive denoising results have been achieved by Bayesian approaches which assume Gaussian models for the image patches. This improvement in performance can be attributed to the use of per-patch models. Unfortunately such an approach is particularly unstable for most inverse problems beyond denoising. In this work, we propose the use of a hyperprior to model image patches, in order to stabilize the estimation procedure. There are two main advantages to the proposed restoration scheme: Firstly it is adapted to diagonal degradation matrices, and in particular to missing data problems (e.g. inpainting of missing pixels or zooming). Secondly it can deal with signal dependent noise models, particularly suited to digital cameras. As such, the scheme is especially adapted to computational photography. In order to illustrate this point, we provide an application to high dynamic range imaging from a single image taken with a modified sensor, which shows the effectiveness of the proposed scheme. △ Less

Submitted 10 June, 2017; originally announced June 2017.

Comments: Some figures are reduced to comply with arxiv's size constraints. Full size images are available as HAL technical report hal-01107519v5, IEEE Transactions on Computational Imaging, 2017

MSC Class: 62H35; 68U10; 62F15; 68Q32 ACM Class: I.4.1, I.4.4, I.2.6

arXiv:1605.01141 [pdf, other]

Texture Synthesis Through Convolutional Neural Networks and Spectrum Constraints

Authors: Gang Liu, Yann Gousseau, Gui-Song Xia

Abstract: This paper presents a significant improvement for the synthesis of texture images using convolutional neural networks (CNNs), making use of constraints on the Fourier spectrum of the results. More precisely, the texture synthesis is regarded as a constrained optimization problem, with constraints conditioning both the Fourier spectrum and statistical features learned by CNNs. In contrast with exis… ▽ More This paper presents a significant improvement for the synthesis of texture images using convolutional neural networks (CNNs), making use of constraints on the Fourier spectrum of the results. More precisely, the texture synthesis is regarded as a constrained optimization problem, with constraints conditioning both the Fourier spectrum and statistical features learned by CNNs. In contrast with existing methods, the presented method inherits from previous CNN approaches the ability to depict local structures and fine scale details, and at the same time yields coherent large scale structures, even in the case of quasi-periodic images. This is done at no extra computational cost. Synthesis experiments on various images show a clear improvement compared to a recent state-of-the art method relying on CNN constraints only. △ Less

Submitted 19 May, 2016; v1 submitted 4 May, 2016; originally announced May 2016.

arXiv:1503.05528 [pdf, ps, other]

doi 10.1137/140954933

Video Inpainting of Complex Scenes

Authors: Alasdair Newson, Andrés Almansa, Matthieu Fradet, Yann Gousseau, Patrick Pérez

Abstract: We propose an automatic video inpainting algorithm which relies on the optimisation of a global, patch-based functional. Our algorithm is able to deal with a variety of challenging situations which naturally arise in video inpainting, such as the correct reconstruction of dynamic textures, multiple moving objects and moving background. Furthermore, we achieve this in an order of magnitude less exe… ▽ More We propose an automatic video inpainting algorithm which relies on the optimisation of a global, patch-based functional. Our algorithm is able to deal with a variety of challenging situations which naturally arise in video inpainting, such as the correct reconstruction of dynamic textures, multiple moving objects and moving background. Furthermore, we achieve this in an order of magnitude less execution time with respect to the state-of-the-art. We are also able to achieve good quality results on high definition videos. Finally, we provide specific algorithmic details to make implementation of our algorithm as easy as possible. The resulting algorithm requires no segmentation or manual input other than the definition of the inpainting mask, and can deal with a wider variety of situations than is handled by previous work. 1. Introduction. Advanced image and video editing techniques are increasingly common in the image processing and computer vision world, and are also starting to be used in media entertainment. One common and difficult task closely linked to the world of video editing is image and video " inpainting ". Generally speaking, this is the task of replacing the content of an image or video with some other content which is visually pleasing. This subject has been extensively studied in the case of images, to such an extent that commercial image inpainting products destined for the general public are available, such as Photoshop's " Content Aware fill " [1]. However, while some impressive results have been obtained in the case of videos, the subject has been studied far less extensively than image inpainting. This relative lack of research can largely be attributed to high time complexity due to the added temporal dimension. Indeed, it has only very recently become possible to produce good quality inpainting results on high definition videos, and this only in a semi-automatic manner. Nevertheless, high-quality video inpainting has many important and useful applications such as film restoration, professional post-production in cinema and video editing for personal use. For this reason, we believe that an automatic, generic video inpainting algorithm would be extremely useful for both academic and professional communities. △ Less

Submitted 8 June, 2015; v1 submitted 18 March, 2015; originally announced March 2015.

Journal ref: SIAM Journal on Imaging Sciences, Society for Industrial and Applied Mathematics, 2014, 7 (4), pp.1993-2019

Showing 1–22 of 22 results for author: Gousseau, Y