Search | arXiv e-print repository

Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers

Authors: Fatemeh Nourilenjan Nokabadi, Jean-François Lalonde, Christian Gagné

Abstract: New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing… ▽ More New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at https://github.com/fatemehN/ReproducibilityStudy. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Published in Transactions on Machine Learning Research (05/2024): https://openreview.net/forum?id=FEEKR0Vl9s

arXiv:2312.04334 [pdf, other]

Towards a Perceptual Evaluation Framework for Lighting Estimation

Authors: Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-François Lalonde

Abstract: Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical exp… ▽ More Progress in lighting estimation is tracked by computing existing image quality assessment (IQA) metrics on images from standard datasets. While this may appear to be a reasonable approach, we demonstrate that doing so does not correlate to human preference when the estimated lighting is used to relight a virtual scene into a real photograph. To study this, we design a controlled psychophysical experiment where human observers must choose their preference amongst rendered scenes lit using a set of lighting estimation algorithms selected from the recent literature, and use it to analyse how these algorithms perform according to human perception. Then, we demonstrate that none of the most popular IQA metrics from the literature, taken individually, correctly represent human perception. Finally, we show that by learning a combination of existing IQA metrics, we can more accurately represent human preference. This provides a new perceptual framework to help evaluate future lighting estimation algorithms. △ Less

Submitted 20 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2305.05023 [pdf, other]

Domain Agnostic Image-to-image Translation using Low-Resolution Conditioning

Authors: Mohamed Abid, Arman Afrasiyabi, Ihsen Hedhli, Jean-François Lalonde, Christian Gagné

Abstract: Generally, image-to-image translation (i2i) methods aim at learning map**s across domains with the assumption that the images used for translation share content (e.g., pose) but have their own domain-specific information (a.k.a. style). Conditioned on a target image, such methods extract the target style and combine it with the source image content, kee** coherence between the domains. In our… ▽ More Generally, image-to-image translation (i2i) methods aim at learning map**s across domains with the assumption that the images used for translation share content (e.g., pose) but have their own domain-specific information (a.k.a. style). Conditioned on a target image, such methods extract the target style and combine it with the source image content, kee** coherence between the domains. In our proposal, we depart from this traditional view and instead consider the scenario where the target domain is represented by a very low-resolution (LR) image, proposing a domain-agnostic i2i method for fine-grained problems, where the domains are related. More specifically, our domain-agnostic approach aims at generating an image that combines visual features from the source image with low-frequency information (e.g. pose, color) of the LR target image. To do so, we present a novel approach that relies on training the generative model to produce images that both share distinctive information of the associated source image and correctly match the LR target image when downscaled. We validate our method on the CelebA-HQ and AFHQ datasets by demonstrating improvements in terms of visual quality. Qualitative and quantitative results show that when dealing with intra-domain image translation, our method generates realistic samples compared to state-of-the-art methods such as StarGAN v2. Ablation studies also reveal that our method is robust to changes in color, it can be applied to out-of-distribution images, and it allows for manual control over the final results. △ Less

Submitted 10 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: 19 pages, 23 figures. arXiv admin note: substantial text overlap with arXiv:2107.11262. Under consideration in Computer Vision and Image Understanding

arXiv:2304.13207 [pdf, other]

EverLight: Indoor-Outdoor Editable HDR Lighting Estimation

Authors: Mohammad Reza Karimi Dastjerdi, Jonathan Eisenmann, Yannick Hold-Geoffroy, Jean-François Lalonde

Abstract: Because of the diversity in lighting environments, existing illumination estimation techniques have been designed explicitly on indoor or outdoor environments. Methods have focused specifically on capturing accurate energy (e.g., through parametric lighting models), which emphasizes shading and strong cast shadows; or producing plausible texture (e.g., with GANs), which prioritizes plausible refle… ▽ More Because of the diversity in lighting environments, existing illumination estimation techniques have been designed explicitly on indoor or outdoor environments. Methods have focused specifically on capturing accurate energy (e.g., through parametric lighting models), which emphasizes shading and strong cast shadows; or producing plausible texture (e.g., with GANs), which prioritizes plausible reflections. Approaches which provide editable lighting capabilities have been proposed, but these tend to be with simplified lighting models, offering limited realism. In this work, we propose to bridge the gap between these recent trends in the literature, and propose a method which combines a parametric light model with 360° panoramas, ready to use as HDRI in rendering engines. We leverage recent advances in GAN-based LDR panorama extrapolation from a regular image, which we extend to HDR using parametric spherical gaussians. To achieve this, we introduce a novel lighting co-modulation method that injects lighting-related features throughout the generator, tightly coupling the original or edited scene illumination within the panorama generation process. In our representation, users can easily edit light direction, intensity, number, etc. to impact shading while providing rich, complex reflections while seamlessly blending with the edits. Furthermore, our method encompasses indoor and outdoor environments, demonstrating state-of-the-art results even when compared to domain-specific methods. △ Less

Submitted 21 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: ICCV 2023, https://lvsn.github.io/everlight/

arXiv:2304.12372 [pdf, other]

Beyond the Pixel: a Photometrically Calibrated HDR Dataset for Luminance and Color Prediction

Authors: Christophe Bolduc, Justine Giroux, Marc Hébert, Claude Demers, Jean-François Lalonde

Abstract: Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we introduce the Laval Photometric Indoor HDR Dataset, the first large-scale photometrically calibrated dataset of high dynamic range 360° panoramas. Our key contribution is the calibration of an existing, unc… ▽ More Light plays an important role in human well-being. However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we introduce the Laval Photometric Indoor HDR Dataset, the first large-scale photometrically calibrated dataset of high dynamic range 360° panoramas. Our key contribution is the calibration of an existing, uncalibrated HDR Dataset. We do so by accurately capturing RAW bracketed exposures simultaneously with a professional photometric measurement device (chroma meter) for multiple scenes across a variety of lighting conditions. Using the resulting measurements, we establish the calibration coefficients to be applied to the HDR images. The resulting dataset is a rich representation of indoor scenes which displays a wide range of illuminance and color, and varied types of light sources. We exploit the dataset to introduce three novel tasks, where: per-pixel luminance, per-pixel color and planar illuminance can be predicted from a single input image. Finally, we also capture another smaller photometric dataset with a commercial 360° camera, to experiment on generalization across cameras. We are optimistic that the release of our datasets and associated code will spark interest in physically accurate light estimation within the community. Dataset and code are available at https://lvsn.github.io/beyondthepixel/. △ Less

Submitted 13 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.09691 [pdf, other]

DarSwin: Distortion Aware Radial Swin Transformer

Authors: Akshaya Athwale, Ichrak Shili, Émile Bergeron, Arman Afrasiyabi, Justin Lagüe, Ola Ahmad, Jean-François Lalonde

Abstract: Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed imag… ▽ More Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/ △ Less

Submitted 7 January, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 18 pages, 12 figures

MSC Class: 68T01

arXiv:2302.06733 [pdf, other]

Robust Unsupervised StyleGAN Image Restoration

Authors: Yohan Poirier-Ginter, Jean-François Lalonde

Abstract: GAN-based image restoration inverts the generative process to repair images corrupted by known degradations. Existing unsupervised methods must be carefully tuned for each task and degradation level. In this work, we make StyleGAN image restoration robust: a single set of hyperparameters works across a wide range of degradation levels. This makes it possible to handle combinations of several degra… ▽ More GAN-based image restoration inverts the generative process to repair images corrupted by known degradations. Existing unsupervised methods must be carefully tuned for each task and degradation level. In this work, we make StyleGAN image restoration robust: a single set of hyperparameters works across a wide range of degradation levels. This makes it possible to handle combinations of several degradations, without the need to retune. Our proposed approach relies on a 3-phase progressive latent space extension and a conservative optimizer, which avoids the need for any additional regularization terms. Extensive experiments demonstrate robustness on inpainting, upsampling, denoising, and deartifacting at varying degradations levels, outperforming other StyleGAN-based inversion techniques. Our approach also favorably compares to diffusion-based restoration by yielding much more realistic inversion results. Code is available at https://lvsn.github.io/RobustUnsupervised/. △ Less

Submitted 22 June, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 8 pages, accepted at CVPR 2023

arXiv:2302.00087 [pdf, other]

LM-GAN: A Photorealistic All-Weather Parametric Sky Model

Authors: Lucas Valença, Ian Maquignaz, Hadi Moazen, Rishikesh Madan, Yannick Hold-Geoffroy, Jean-François Lalonde

Abstract: We present LM-GAN, an HDR sky model that generates photorealistic environment maps with weathered skies. Our sky model retains the flexibility of traditional parametric models and enables the reproduction of photorealistic all-weather skies with visual diversity in cloud formations. This is achieved with flexible and intuitive user controls for parameters, including sun position, sky color, and at… ▽ More We present LM-GAN, an HDR sky model that generates photorealistic environment maps with weathered skies. Our sky model retains the flexibility of traditional parametric models and enables the reproduction of photorealistic all-weather skies with visual diversity in cloud formations. This is achieved with flexible and intuitive user controls for parameters, including sun position, sky color, and atmospheric turbidity. Our method is trained directly from inputs fitted to real HDR skies, learning both to preserve the input's illumination and correlate it to the real reference's atmospheric components in an end-to-end manner. Our main contributions are a generative model trained on both sky appearance and scene rendering losses, as well as a novel sky-parameter fitting algorithm. We demonstrate that our fitting algorithm surpasses existing approaches in both accuracy and sky fidelity, and also provide quantitative and qualitative analyses, demonstrating LM-GAN's ability to match parametric input to photorealistic all-weather skies. The generated HDR environment maps are ready to use in 3D rendering engines and can be applied to a wide range of image-based lighting applications. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2212.04441 [pdf, other]

The Differentiable Lens: Compound Lens Search over Glass Surfaces and Materials for Object Detection

Authors: Geoffroi Côté, Fahim Mannan, Simon Thibault, Jean-François Lalonde, Felix Heide

Abstract: Most camera lens systems are designed in isolation, separately from downstream computer vision methods. Recently, joint optimization approaches that design lenses alongside other components of the image acquisition and processing pipeline -- notably, downstream neural networks -- have achieved improved imaging quality or better performance on vision tasks. However, these existing methods optimize… ▽ More Most camera lens systems are designed in isolation, separately from downstream computer vision methods. Recently, joint optimization approaches that design lenses alongside other components of the image acquisition and processing pipeline -- notably, downstream neural networks -- have achieved improved imaging quality or better performance on vision tasks. However, these existing methods optimize only a subset of lens parameters and cannot optimize glass materials given their categorical nature. In this work, we develop a differentiable spherical lens simulation model that accurately captures geometrical aberrations. We propose an optimization strategy to address the challenges of lens design -- notorious for non-convex loss function landscapes and many manufacturing constraints -- that are exacerbated in joint optimization tasks. Specifically, we introduce quantized continuous glass variables to facilitate the optimization and selection of glass materials in an end-to-end design context, and couple this with carefully designed constraints to support manufacturability. In automotive object detection, we report improved detection performance over existing designs even when simplifying designs to two- or three-element lenses, despite significantly degrading the image quality. △ Less

Submitted 27 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: 15 pages, 12 figures, to appear in CVPR 2023 proceedings, updated to reflect camera-ready submission

arXiv:2211.03928 [pdf, other]

Editable Indoor Lighting Estimation

Authors: Henrique Weber, Mathieu Garon, Jean-François Lalonde

Abstract: We present a method for estimating lighting from a single perspective image of an indoor scene. Previous methods for predicting indoor illumination usually focus on either simple, parametric lighting that lack realism, or on richer representations that are difficult or even impossible to understand or modify after prediction. We propose a pipeline that estimates a parametric light that is easy to… ▽ More We present a method for estimating lighting from a single perspective image of an indoor scene. Previous methods for predicting indoor illumination usually focus on either simple, parametric lighting that lack realism, or on richer representations that are difficult or even impossible to understand or modify after prediction. We propose a pipeline that estimates a parametric light that is easy to edit and allows renderings with strong shadows, alongside with a non-parametric texture with high-frequency information necessary for realistic rendering of specular objects. Once estimated, the predictions obtained with our model are interpretable and can easily be modified by an artist/user with a few mouse clicks. Quantitative and qualitative results show that our approach makes indoor lighting estimation easier to handle by a casual user, while still producing competitive results. △ Less

Submitted 9 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: ECCV 2022

arXiv:2208.12300 [pdf, other]

A Deep Perceptual Measure for Lens and Camera Calibration

Authors: Yannick Hold-Geoffroy, Dominique Piché-Meunier, Kalyan Sunkavalli, Jean-Charles Bazin, François Rameau, Jean-François Lalonde

Abstract: Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-image calibration process, we propose to infer the camera calibration parameters such as pitch, roll, fie… ▽ More Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-image calibration process, we propose to infer the camera calibration parameters such as pitch, roll, field of view, and lens distortion directly from a single image using a deep convolutional neural network. We train this network using automatically generated samples from a large-scale panorama dataset, yielding competitive accuracy in terms of standard `2 error. However, we argue that minimizing such standard error metrics might not be optimal for many applications. In this work, we investigate human sensitivity to inaccuracies in geometric camera calibration. To this end, we conduct a large-scale human perception study where we ask participants to judge the realism of 3D objects composited with correct and biased camera calibration parameters. Based on this study, we develop a new perceptual measure for camera calibration and demonstrate that our deep calibration network outperforms previous single-image based calibration methods both on standard metrics as well as on this novel perceptual measure. Finally, we demonstrate the use of our calibration network for several applications, including virtual object insertion, image retrieval, and compositing. A demonstration of our approach is available at https://lvsn.github.io/deepcalib . △ Less

Submitted 26 July, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: 12 pages, 12 figures, project page (including live demo) available at https://lvsn.github.io/deepcalib. arXiv admin note: text overlap with arXiv:1712.01259

arXiv:2208.07903 [pdf, other]

Casual Indoor HDR Radiance Capture from Omnidirectional Images

Authors: Pulkit Gera, Mohammad Reza Karimi Dastjerdi, Charles Renaud, P. J. Narayanan, Jean-François Lalonde

Abstract: We present PanoHDR-NeRF, a neural representation of the full HDR radiance field of an indoor scene, and a pipeline to capture it casually, without elaborate setups or complex capture protocols. First, a user captures a low dynamic range (LDR) omnidirectional video of the scene by freely waving an off-the-shelf camera around the scene. Then, an LDR2HDR network uplifts the captured LDR frames to HDR… ▽ More We present PanoHDR-NeRF, a neural representation of the full HDR radiance field of an indoor scene, and a pipeline to capture it casually, without elaborate setups or complex capture protocols. First, a user captures a low dynamic range (LDR) omnidirectional video of the scene by freely waving an off-the-shelf camera around the scene. Then, an LDR2HDR network uplifts the captured LDR frames to HDR, which are used to train a tailored NeRF++ model. The resulting PanoHDR-NeRF can render full HDR images from any location of the scene. Through experiments on a novel test dataset of real scenes with the ground truth HDR radiance captured at locations not seen during training, we show that PanoHDR-NeRF predicts plausible HDR radiance from any scene point. We also show that the predicted radiance can synthesize correct lighting effects, enabling the augmentation of indoor scenes with synthetic objects that are lit correctly. Datasets and code are available at https://lvsn.github.io/PanoHDR-NeRF/. △ Less

Submitted 19 October, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: BMVC 2022

arXiv:2208.00921 [pdf, other]

AdaWCT: Adaptive Whitening and Coloring Style Injection

Authors: Antoine Dufour, Yohan Poirier-Ginter, Alexandre Lessard, Ryan Smith, Michael Lockyer, Jean-Francois Lalonde

Abstract: Adaptive instance normalization (AdaIN) has become the standard method for style injection: by re-normalizing features through scale-and-shift operations, it has found widespread use in style transfer, image generation, and image-to-image translation. In this work, we present a generalization of AdaIN which relies on the whitening and coloring transformation (WCT) which we dub AdaWCT, that we appl… ▽ More Adaptive instance normalization (AdaIN) has become the standard method for style injection: by re-normalizing features through scale-and-shift operations, it has found widespread use in style transfer, image generation, and image-to-image translation. In this work, we present a generalization of AdaIN which relies on the whitening and coloring transformation (WCT) which we dub AdaWCT, that we apply for style injection in large GANs. We show, through experiments on the StarGANv2 architecture, that this generalization, albeit conceptually simple, results in significant improvements in the quality of the generated images. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 4 pages + refs

arXiv:2207.11643 [pdf, other]

Robust Scene Inference under Noise-Blur Dual Corruptions

Authors: Bhavya Goyal, Jean-François Lalonde, Yin Li, Mohit Gupta

Abstract: Scene inference under low-light is a challenging problem due to severe noise in the captured images. One way to reduce noise is to use longer exposure during the capture. However, in the presence of motion (scene or camera motion), longer exposures lead to motion blur, resulting in loss of image information. This creates a trade-off between these two kinds of image degradations: motion blur (due t… ▽ More Scene inference under low-light is a challenging problem due to severe noise in the captured images. One way to reduce noise is to use longer exposure during the capture. However, in the presence of motion (scene or camera motion), longer exposures lead to motion blur, resulting in loss of image information. This creates a trade-off between these two kinds of image degradations: motion blur (due to long exposure) vs. noise (due to short exposure), also referred as a dual image corruption pair in this paper. With the rise of cameras capable of capturing multiple exposures of the same scene simultaneously, it is possible to overcome this trade-off. Our key observation is that although the amount and nature of degradation varies for these different image captures, the semantic content remains the same across all images. To this end, we propose a method to leverage these multi exposure captures for robust inference under low-light and motion. Our method builds on a feature consistency loss to encourage similar results from these individual captures, and uses the ensemble of their final predictions for robust visual recognition. We demonstrate the effectiveness of our approach on simulated images as well as real captures with multiple exposures, and across the tasks of object detection and image classification. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: ICCP 2022 Camera Ready

arXiv:2205.06304 [pdf, other]

Overparameterization Improves StyleGAN Inversion

Authors: Yohan Poirier-Ginter, Alexandre Lessard, Ryan Smith, Jean-François Lalonde

Abstract: Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and… ▽ More Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and downstream editability. To improve quality, these approaches must resort to various techniques that extend the model latent space after training. Taking a step back, we observe that these methods essentially all propose, in one way or another, to increase the number of free parameters. This suggests that inversion might be difficult because it is underconstrained. In this work, we address this directly and dramatically overparameterize the latent space, before training, with simple changes to the original StyleGAN architecture. Our overparameterization increases the available degrees of freedom, which in turn facilitates inversion. We show that this allows us to obtain near-perfect image reconstruction without the need for encoders nor for altering the latent space after training. Our approach also retains editability, which we demonstrate by realistically interpolating between images. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 6 pages, accepted for publication at AI for Content Creation Workshop (CVPR 2022)

arXiv:2204.07286 [pdf, other]

doi 10.1109/3DV57658.2022.00059

Guided Co-Modulated GAN for 360° Field of View Extrapolation

Authors: Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Jonathan Eisenmann, Siavash Khodadadeh, Jean-François Lalonde

Abstract: We propose a method to extrapolate a 360° field of view from a single image that allows for user-controlled synthesis of the out-painted content. To do so, we propose improvements to an existing GAN-based in-painting architecture for out-painting panoramic image representation. Our method obtains state-of-the-art results and outperforms previous methods on standard image quality metrics. To allow… ▽ More We propose a method to extrapolate a 360° field of view from a single image that allows for user-controlled synthesis of the out-painted content. To do so, we propose improvements to an existing GAN-based in-painting architecture for out-painting panoramic image representation. Our method obtains state-of-the-art results and outperforms previous methods on standard image quality metrics. To allow controlled synthesis of out-painting, we introduce a novel guided co-modulation framework, which drives the image generation process with a common pretrained discriminative model. Doing so maintains the high visual quality of generated panoramas while enabling user-controlled semantic content in the extrapolated field of view. We demonstrate the state-of-the-art results of our method on field of view extrapolation both qualitatively and quantitatively, providing thorough analysis of our novel editing capabilities. Finally, we demonstrate that our approach benefits the photorealistic virtual insertion of highly glossy objects in photographs. △ Less

Submitted 22 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 8 pages, 9 figures

arXiv:2204.00949 [pdf, other]

Matching Feature Sets for Few-Shot Image Classification

Authors: Arman Afrasiyabi, Hugo Larochelle, Jean-François Lalonde, Christian Gagné

Abstract: In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of… ▽ More In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets -- namely miniImageNet, tieredImageNet, and CUB -- in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art. △ Less

Submitted 2 April, 2022; originally announced April 2022.

Comments: International Conference on Computer Vision and Pattern Recognition (CVPR), 2022

arXiv:2203.12033 [pdf, other]

doi 10.1038/s43246-022-00319-2

Bioplastic Design using Multitask Deep Neural Networks

Authors: Christopher Kuenneth, Jessica Lalonde, Babetta L. Marrone, Carl N. Iverson, Rampi Ramprasad, Ghanshyam Pilania

Abstract: Non-degradable plastic waste stays for decades on land and in water, jeopardizing our environment; yet our modern lifestyle and current technologies are impossible to sustain without plastics. Bio-synthesized and biodegradable alternatives such as the polymer family of polyhydroxyalkanoates (PHAs) have the potential to replace large portions of the world's plastic supply with cradle-to-cradle mate… ▽ More Non-degradable plastic waste stays for decades on land and in water, jeopardizing our environment; yet our modern lifestyle and current technologies are impossible to sustain without plastics. Bio-synthesized and biodegradable alternatives such as the polymer family of polyhydroxyalkanoates (PHAs) have the potential to replace large portions of the world's plastic supply with cradle-to-cradle materials, but their chemical complexity and diversity limit traditional resource-intensive experimentation. In this work, we develop multitask deep neural network property predictors using available experimental data for a diverse set of nearly 23000 homo- and copolymer chemistries. Using the predictors, we identify 14 PHA-based bioplastics from a search space of almost 1.4 million candidates which could serve as potential replacements for seven petroleum-based commodity plastics that account for 75% of the world's yearly plastic production. We discuss possible synthesis routes for these identified promising materials. The developed multitask polymer property predictors are made available as a part of the Polymer Genome project at https://PolymerGenome.org. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Journal ref: Commun Mater 3, 96 (2022)

arXiv:2111.13681 [pdf, other]

ManiFest: Manifold Deformation for Few-shot Image Translation

Authors: Fabio Pizzati, Jean-François Lalonde, Raoul de Charette

Abstract: Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to… ▽ More Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to be composed of large numbers of images). The learned manifold is interpolated and deformed towards the few-shot target domain via patch-based adversarial and feature statistics alignment losses. All of these components are trained simultaneously during a single end-to-end loop. In addition to the general few-shot translation task, our approach can alternatively be conditioned on a single exemplar image to reproduce its specific style. Extensive experiments demonstrate the efficacy of ManiFest on multiple tasks, outperforming the state-of-the-art on all metrics and in both the general- and exemplar-based scenarios. Our code is available at https://github.com/cv-rits/Manifest . △ Less

Submitted 20 July, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: ECCV 2022

arXiv:2107.11262 [pdf, other]

Image-to-Image Translation with Low Resolution Conditioning

Authors: Mohamed Abderrahmen Abid, Ihsen Hedhli, Jean-François Lalonde, Christian Gagne

Abstract: Most image-to-image translation methods focus on learning map**s across domains with the assumption that images share content (e.g., pose) but have their own domain-specific information known as style. When conditioned on a target image, such methods aim to extract the style of the target and combine it with the content of the source image. In this work, we consider the scenario where the target… ▽ More Most image-to-image translation methods focus on learning map**s across domains with the assumption that images share content (e.g., pose) but have their own domain-specific information known as style. When conditioned on a target image, such methods aim to extract the style of the target and combine it with the content of the source image. In this work, we consider the scenario where the target image has a very low resolution. More specifically, our approach aims at transferring fine details from a high resolution (HR) source image to fit a coarse, low resolution (LR) image representation of the target. We therefore generate HR images that share features from both HR and LR inputs. This differs from previous methods that focus on translating a given image style into a target content, our translation approach being able to simultaneously imitate the style and merge the structural information of the LR target. Our approach relies on training the generative model to produce HR target images that both 1) share distinctive information of the associated source image; 2) correctly match the LR target image when downscaled. We validate our method on the CelebA-HQ and AFHQ datasets by demonstrating improvements in terms of visual quality, diversity and coverage. Qualitative and quantitative results show that when dealing with intra-domain image translation, our method generates more realistic samples compared to state-of-the-art methods such as Stargan-v2 △ Less

Submitted 23 July, 2021; originally announced July 2021.

arXiv:2011.11872 [pdf, other]

Mixture-based Feature Space Learning for Few-shot Image Classification

Authors: Arman Afrasiyabi, Jean-François Lalonde, Christian Gagné

Abstract: We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously tr… ▽ More We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously training the feature extractor and learning the mixture model parameters in an online manner. This results in a richer and more discriminative feature space which can be employed to classify novel examples from very few samples. Two main stages are proposed to train the MixtFSL model. First, the multimodal mixtures for each base class and the feature extractor parameters are learned using a combination of two loss functions. Second, the resulting network and mixture models are progressively refined through a leader-follower learning procedure, which uses the current estimate as a "target" network. This target network is used to make a consistent assignment of instances to mixture components, which increases performance and stabilizes training. The effectiveness of our end-to-end feature space learning approach is demonstrated with extensive experiments on four standard datasets and four backbones. Notably, we demonstrate that when we combine our robust representation with recent alignment-based approaches, we achieve new state-of-the-art results in the inductive setting, with an absolute accuracy for 5-shot classification of 82.45 on miniImageNet, 88.20 with tieredImageNet, and 60.70 in FC100 using the ResNet-12 backbone. △ Less

Submitted 17 August, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

arXiv:2010.04143 [pdf, other]

doi 10.1109/3DV50981.2020.00126

Deep SVBRDF Estimation on Real Materials

Authors: Louis-Philippe Asselin, Denis Laurendeau, Jean-François Lalonde

Abstract: Recent work has demonstrated that deep learning approaches can successfully be used to recover accurate estimates of the spatially-varying BRDF (SVBRDF) of a surface from as little as a single image. Closer inspection reveals, however, that most approaches in the literature are trained purely on synthetic data, which, while diverse and realistic, is often not representative of the richness of the… ▽ More Recent work has demonstrated that deep learning approaches can successfully be used to recover accurate estimates of the spatially-varying BRDF (SVBRDF) of a surface from as little as a single image. Closer inspection reveals, however, that most approaches in the literature are trained purely on synthetic data, which, while diverse and realistic, is often not representative of the richness of the real world. In this paper, we show that training such networks exclusively on synthetic data is insufficient to achieve adequate results when tested on real data. Our analysis leverages a new dataset of real materials obtained with a novel portable multi-light capture apparatus. Through an extensive series of experiments and with the use of a novel deep learning architecture, we explore two strategies for improving results on real data: finetuning, and a per-material optimization procedure. We show that adapting network weights to real data is of critical importance, resulting in an approach which significantly outperforms previous methods for SVBRDF estimation on real materials. Dataset and code are available at https://lvsn.github.io/real-svbrdf △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Accepted submission to 3DV 2020. Project page https://lvsn.github.io/real-svbrdf

arXiv:2009.03683 [pdf, other]

doi 10.1007/s11263-020-01366-3

Rain rendering for evaluating and improving robustness to bad weather

Authors: Maxime Tremblay, Shirsendu Sukanta Halder, Raoul de Charette, Jean-François Lalonde

Abstract: Rain fills the atmosphere with water particles, which breaks the common assumption that light travels unaltered from the scene to the camera. While it is well-known that rain affects computer vision algorithms, quantifying its impact is difficult. In this context, we present a rain rendering pipeline that enables the systematic evaluation of common computer vision algorithms to controlled amounts… ▽ More Rain fills the atmosphere with water particles, which breaks the common assumption that light travels unaltered from the scene to the camera. While it is well-known that rain affects computer vision algorithms, quantifying its impact is difficult. In this context, we present a rain rendering pipeline that enables the systematic evaluation of common computer vision algorithms to controlled amounts of rain. We present three different ways to add synthetic rain to existing images datasets: completely physic-based; completely data-driven; and a combination of both. The physic-based rain augmentation combines a physical particle simulator and accurate rain photometric modeling. We validate our rendering methods with a user study, demonstrating our rain is judged as much as 73% more realistic than the state-of-theart. Using our generated rain-augmented KITTI, Cityscapes, and nuScenes datasets, we conduct a thorough evaluation of object detection, semantic segmentation, and depth estimation algorithms and show that their performance decreases in degraded weather, on the order of 15% for object detection, 60% for semantic segmentation, and 6-fold increase in depth estimation error. Finetuning on our augmented synthetic data results in improvements of 21% on object detection, 37% on semantic segmentation, and 8% on depth estimation. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 19 pages, 19 figures, IJCV 2020 preprint. arXiv admin note: text overlap with arXiv:1908.10335

arXiv:2006.05011 [pdf, other]

RGB-D-E: Event Camera Calibration for Fast 6-DOF Object Tracking

Authors: Etienne Dubeau, Mathieu Garon, Benoit Debaque, Raoul de Charette, Jean-François Lalonde

Abstract: Augmented reality devices require multiple sensors to perform various tasks such as localization and tracking. Currently, popular cameras are mostly frame-based (e.g. RGB and Depth) which impose a high data bandwidth and power usage. With the necessity for low power and more responsive augmented reality systems, using solely frame-based sensors imposes limits to the various algorithms that needs h… ▽ More Augmented reality devices require multiple sensors to perform various tasks such as localization and tracking. Currently, popular cameras are mostly frame-based (e.g. RGB and Depth) which impose a high data bandwidth and power usage. With the necessity for low power and more responsive augmented reality systems, using solely frame-based sensors imposes limits to the various algorithms that needs high frequency data from the environement. As such, event-based sensors have become increasingly popular due to their low power, bandwidth and latency, as well as their very high frequency data acquisition capabilities. In this paper, we propose, for the first time, to use an event-based camera to increase the speed of 3D object tracking in 6 degrees of freedom. This application requires handling very high object speed to convey compelling AR experiences. To this end, we propose a new system which combines a recent RGB-D sensor (Kinect Azure) with an event camera (DAVIS346). We develop a deep learning approach, which combines an existing RGB-D network along with a novel event-based network in a cascade fashion, and demonstrate that our approach significantly improves the robustness of a state-of-the-art frame-based 6-DOF object tracker using our RGB-D-E pipeline. △ Less

Submitted 5 August, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 9 pages, 9 figures

arXiv:2002.02852 [pdf, other]

Input Dropout for Spatially Aligned Modalities

Authors: Sébastien de Blois, Mathieu Garon, Christian Gagné, Jean-François Lalonde

Abstract: Computer vision datasets containing multiple modalities such as color, depth, and thermal properties are now commonly accessible and useful for solving a wide array of challenging tasks. However, deploying multi-sensor heads is not possible in many scenarios. As such many practical solutions tend to be based on simpler sensors, mostly for cost, simplicity and robustness considerations. In this wor… ▽ More Computer vision datasets containing multiple modalities such as color, depth, and thermal properties are now commonly accessible and useful for solving a wide array of challenging tasks. However, deploying multi-sensor heads is not possible in many scenarios. As such many practical solutions tend to be based on simpler sensors, mostly for cost, simplicity and robustness considerations. In this work, we propose a training methodology to take advantage of these additional modalities available in datasets, even if they are not available at test time. By assuming that the modalities have a strong spatial correlation, we propose Input Dropout, a simple technique that consists in stochastic hiding of one or many input modalities at training time, while using only the canonical (e.g. RGB) modalities at test time. We demonstrate that Input Dropout trivially combines with existing deep convolutional architectures, and improves their performance on a wide range of computer vision tasks such as dehazing, 6-DOF object tracking, pedestrian detection and object classification. △ Less

Submitted 21 May, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

Comments: Accepted in ICIP 2020. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:1912.05094 [pdf, other]

Associative Alignment for Few-shot Image Classification

Authors: Arman Afrasiyabi, Jean-François Lalonde, Christian Gagné

Abstract: Few-shot image classification aims at training a model from only a few examples for each of the "novel" classes. This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set. This expands the size of the effective novel training set by adding extra "related base" instances to… ▽ More Few-shot image classification aims at training a model from only a few examples for each of the "novel" classes. This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set. This expands the size of the effective novel training set by adding extra "related base" instances to the few novel ones, thereby allowing a constructive fine-tuning. We propose two associative alignment strategies: 1) a metric-learning loss for minimizing the distance between related base samples and the centroid of novel instances in the feature space, and 2) a conditional adversarial alignment loss based on the Wasserstein distance. Experiments on four standard datasets and three backbones demonstrate that combining our centroid-based alignment loss results in absolute accuracy improvements of 4.4%, 1.2%, and 6.2% in 5-shot learning over the state of the art for object recognition, fine-grained classification, and cross-domain adaptation, respectively. △ Less

Submitted 4 August, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

arXiv:1911.11822 [pdf, other]

Deep Template-based Object Instance Detection

Authors: Jean-Philippe Mercier, Mathieu Garon, Philippe Giguère, Jean-François Lalonde

Abstract: Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid… ▽ More Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. What is more, gathering a dataset and training a model for every new object instance to be detected can be an expensive and time-consuming process. In this context, we propose a generic 2D object instance detection approach that uses example viewpoints of the target object at test time to retrieve its 2D location in RGB images, without requiring any additional training (i.e. fine-tuning) step. To this end, we present an end-to-end architecture that extracts global and local information of the object from its viewpoints. The global information is used to tune early filters in the backbone while local viewpoints are correlated with the input image. Our method offers an improvement of almost 30 mAP over the previous template matching methods on the challenging Occluded Linemod dataset (overall mAP of 50.7). Our experiments also show that our single generic model (not trained on any of the test objects) yields detection results that are on par with approaches that are trained specifically on the target objects. △ Less

Submitted 14 November, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

arXiv:1910.08812 [pdf, other]

Deep Parametric Indoor Lighting Estimation

Authors: Marc-André Gardner, Yannick Hold-Geoffroy, Kalyan Sunkavalli, Christian Gagné, Jean-François Lalonde

Abstract: We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a datas… ▽ More We present a method to estimate lighting from a single image of an indoor scene. Previous work has used an environment map representation that does not account for the localized nature of indoor lighting. Instead, we represent lighting as a set of discrete 3D lights with geometric and photometric parameters. We train a deep neural network to regress these parameters from a single image, on a dataset of environment maps annotated with depth. We propose a differentiable layer to convert these parameters to an environment map to compute our loss; this bypasses the challenge of establishing correspondences between estimated and ground truth lights. We demonstrate, via quantitative and qualitative evaluations, that our representation and training scheme lead to more accurate results compared to previous work, while allowing for more realistic 3D object compositing with spatially-varying lighting. △ Less

Submitted 19 October, 2019; originally announced October 2019.

arXiv:1908.10335 [pdf, other]

Physics-Based Rendering for Improving Robustness to Rain

Authors: Shirsendu Sukanta Halder, Jean-François Lalonde, Raoul de Charette

Abstract: To improve the robustness to rain, we present a physically-based rain rendering pipeline for realistically inserting rain into clear weather images. Our rendering relies on a physical particle simulator, an estimation of the scene lighting and an accurate rain photometric modeling to augment images with arbitrary amount of realistic rain or fog. We validate our rendering with a user study, proving… ▽ More To improve the robustness to rain, we present a physically-based rain rendering pipeline for realistically inserting rain into clear weather images. Our rendering relies on a physical particle simulator, an estimation of the scene lighting and an accurate rain photometric modeling to augment images with arbitrary amount of realistic rain or fog. We validate our rendering with a user study, proving our rain is judged 40% more realistic that state-of-the-art. Using our generated weather augmented Kitti and Cityscapes dataset, we conduct a thorough evaluation of deep object detection and semantic segmentation algorithms and show that their performance decreases in degraded weather, on the order of 15% for object detection and 60% for semantic segmentation. Furthermore, we show refining existing networks with our augmented images improves the robustness of both object detection and semantic segmentation algorithms. We experiment on nuScenes and measure an improvement of 15% for object detection and 35% for semantic segmentation compared to original rainy performance. Augmented databases and code are available on the project page. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: ICCV 2019. Supplementary pdf / videos available on project page

arXiv:1906.04909 [pdf, other]

All-Weather Deep Outdoor Lighting Estimation

Authors: **song Zhang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Sunil Hadap, Jonathan Eisenmann, Jean-François Lalonde

Abstract: We present a neural network that predicts HDR outdoor illumination from a single LDR image. At the heart of our work is a method to accurately learn HDR lighting from LDR panoramas under any weather condition. We achieve this by training another CNN (on a combination of synthetic and real images) to take as input an LDR panorama, and regress the parameters of the Lalonde-Matthews outdoor illuminat… ▽ More We present a neural network that predicts HDR outdoor illumination from a single LDR image. At the heart of our work is a method to accurately learn HDR lighting from LDR panoramas under any weather condition. We achieve this by training another CNN (on a combination of synthetic and real images) to take as input an LDR panorama, and regress the parameters of the Lalonde-Matthews outdoor illumination model. This model is trained such that it a) reconstructs the appearance of the sky, and b) renders the appearance of objects lit by this illumination. We use this network to label a large-scale dataset of LDR panoramas with lighting parameters and use them to train our single image outdoor lighting estimation network. We demonstrate, via extensive experiments, that both our panorama and single image networks outperform the state of the art, and unlike prior work, are able to handle weather conditions ranging from fully sunny to overcast skies. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Comments: 8 pages, CVPR 19. Project page: http://lvsn.github.io/allweather

arXiv:1906.03799 [pdf, other]

Fast Spatially-Varying Indoor Lighting Estimation

Authors: Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Jean-François Lalonde

Abstract: We propose a real-time method to estimate spatiallyvarying indoor lighting from a single RGB image. Given an image and a 2D location in that image, our CNN estimates a 5th order spherical harmonic representation of the lighting at the given location in less than 20ms on a laptop mobile graphics card. While existing approaches estimate a single, global lighting representation or require depth as in… ▽ More We propose a real-time method to estimate spatiallyvarying indoor lighting from a single RGB image. Given an image and a 2D location in that image, our CNN estimates a 5th order spherical harmonic representation of the lighting at the given location in less than 20ms on a laptop mobile graphics card. While existing approaches estimate a single, global lighting representation or require depth as input, our method reasons about local lighting without requiring any geometry information. We demonstrate, through quantitative experiments including a user study, that our results achieve lower lighting estimation errors and are preferred by users over the state-of-the-art. Our approach can be used directly for augmented reality applications, where a virtual object is relit realistically at any position in the scene in real-time. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: CVPR19

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6908-6917

arXiv:1906.03355 [pdf, other]

Learning Physics-guided Face Relighting under Directional Light

Authors: Thomas Nestmeyer, Jean-François Lalonde, Iain Matthews, Andreas M. Lehrmann

Abstract: Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the i… ▽ More Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera. △ Less

Submitted 19 April, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: CVPR 2020 (Oral)

arXiv:1905.03897 [pdf, other]

Deep Sky Modeling for Single Image Outdoor Lighting Estimation

Authors: Yannick Hold-Geoffroy, Akshaya Athawale, Jean-François Lalonde

Abstract: We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image. As no large-scale dataset of images and their corresponding ground truth illumination is readily available, we use complementary datasets to train our approach, combining the vast diversity of illumination conditions of SUN360 with the radiometrically calibrated and physically accurate Lav… ▽ More We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image. As no large-scale dataset of images and their corresponding ground truth illumination is readily available, we use complementary datasets to train our approach, combining the vast diversity of illumination conditions of SUN360 with the radiometrically calibrated and physically accurate Laval HDR sky database. Our key contribution is to provide a holistic view of both lighting modeling and estimation, solving both problems end-to-end. From a test image, our method can directly estimate an HDR environment map of the lighting without relying on analytical lighting models. We demonstrate the versatility and expressivity of our learned sky model and show that it can be used to recover plausible illumination, leading to visually pleasant virtual object insertions. To further evaluate our method, we capture a dataset of HDR 360° panoramas and show through extensive validation that we significantly outperform previous state-of-the-art. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: CVPR'19 paper

arXiv:1810.06327 [pdf, other]

Deep Photovoltaic Nowcasting

Authors: **song Zhang, Rodrigo Verschae, Shohei Nobuhara, Jean-François Lalonde

Abstract: Predicting the short-term power output of a photovoltaic panel is an important task for the efficient management of smart grids. Short-term forecasting at the minute scale, also known as nowcasting, can benefit from sky images captured by regular cameras and installed close to the solar panel. However, estimating the weather conditions from these images---sun intensity, cloud appearance and moveme… ▽ More Predicting the short-term power output of a photovoltaic panel is an important task for the efficient management of smart grids. Short-term forecasting at the minute scale, also known as nowcasting, can benefit from sky images captured by regular cameras and installed close to the solar panel. However, estimating the weather conditions from these images---sun intensity, cloud appearance and movement, etc.---is a very challenging task that the community has yet to solve with traditional computer vision techniques. In this work, we propose to learn the relationship between sky appearance and the future photovoltaic power output using deep learning. We train several variants of convolutional neural networks which take historical photovoltaic power values and sky images as input and estimate photovoltaic power in a very short term future. In particular, we compare three different architectures based on: a multi-layer perceptron (MLP), a convolutional neural network (CNN), and a long short term memory (LSTM) module. We evaluate our approach quantitatively on a dataset of photovoltaic power values and corresponding images gathered in Kyoto, Japan. Our experiments reveal that the MLP network, already used similarly in previous work, achieves an RMSE skill score of 7% over the commonly-used persistence baseline on the 1-minute future photovoltaic power prediction task. Our CNN-based network improves upon this with a 12% skill score. In contrast, our LSTM-based model, which can learn the temporal dependencies in the data, achieves a 21% RMSE skill score, thus outperforming all other approaches. △ Less

Submitted 15 October, 2018; originally announced October 2018.

Comments: 28 pages, 10 figure, 4 tables, preprint accepted to Solar Energy

arXiv:1806.03994 [pdf, other]

Learning to Estimate Indoor Lighting from 3D Objects

Authors: Henrique Weber, Donald Prévost, Jean-François Lalonde

Abstract: In this work, we propose a step towards a more accurate prediction of the environment light given a single picture of a known object. To achieve this, we developed a deep learning method that is able to encode the latent space of indoor lighting using few parameters and that is trained on a database of environment maps. This latent space is then used to generate predictions of the light that are b… ▽ More In this work, we propose a step towards a more accurate prediction of the environment light given a single picture of a known object. To achieve this, we developed a deep learning method that is able to encode the latent space of indoor lighting using few parameters and that is trained on a database of environment maps. This latent space is then used to generate predictions of the light that are both more realistic and accurate than previous methods. To achieve this, our first contribution is a deep autoencoder which is capable of learning the feature space that compactly models lighting. Our second contribution is a convolutional neural network that predicts the light from a single image of a known object. To train these networks, our third contribution is a novel dataset that contains 21,000 HDR indoor environment maps. The results indicate that the predictor can generate plausible lighting estimations even from diffuse objects. △ Less

Submitted 13 August, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: 3DV 2018 - International Conference on 3D Vision

arXiv:1804.10094 [pdf, other]

Domain Adaptation through Synthesis for Unsupervised Person Re-identification

Authors: Slawomir Bak, Peter Carr, Jean-Francois Lalonde

Abstract: Drastic variations in illumination across surveillance cameras make the person re-identification problem extremely challenging. Current large scale re-identification datasets have a significant number of training subjects, but lack diversity in lighting conditions. As a result, a trained model requires fine-tuning to become effective under an unseen illumination condition. To alleviate this proble… ▽ More Drastic variations in illumination across surveillance cameras make the person re-identification problem extremely challenging. Current large scale re-identification datasets have a significant number of training subjects, but lack diversity in lighting conditions. As a result, a trained model requires fine-tuning to become effective under an unseen illumination condition. To alleviate this problem, we introduce a new synthetic dataset that contains hundreds of illumination conditions. Specifically, we use 100 virtual humans illuminated with multiple HDR environment maps which accurately model realistic indoor and outdoor lighting. To achieve better accuracy in unseen illumination conditions we propose a novel domain adaptation technique that takes advantage of our synthetic data and performs fine-tuning in a completely unsupervised way. Our approach yields significantly higher accuracy than semi-supervised and unsupervised state-of-the-art methods, and is very competitive with supervised techniques. △ Less

Submitted 26 April, 2018; originally announced April 2018.

arXiv:1803.10850 [pdf, other]

doi 10.1109/TPAMI.2019.2962693

Single Day Outdoor Photometric Stereo

Authors: Yannick Hold-Geoffroy, Paulo F. U. Gotardo, Jean-François Lalonde

Abstract: Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the… ▽ More Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the relationship between weather and surface reconstructability in order to understand when natural lighting allows existing PS algorithms to work. Our analysis reveals that partially cloudy days improve the conditioning of the outdoor PS problem while sunny days do not allow the unambiguous recovery of surface normals from photometric cues alone. We demonstrate that calibrated PS algorithms can thus be employed to reconstruct Lambertian surfaces accurately under partially cloudy days. Second, we solve the ambiguity arising in clear days by combining photometric cues with prior knowledge on material properties, local surface geometry and the natural variations in outdoor lighting through a CNN-based, weakly-calibrated PS technique. Given a sequence of outdoor images captured during a single sunny day, our method robustly estimates the scene surface normals with unprecedented quality for the considered scenario. Our approach does not require precise geolocation and significantly outperforms several state-of-the-art methods on images with real lighting, showing that our CNN can combine efficiently learned priors and photometric cues available during a single sunny day. △ Less

Submitted 2 January, 2020; v1 submitted 28 March, 2018; originally announced March 2018.

Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence 2019, 0162-8828

arXiv:1803.10075 [pdf, other]

A Framework for Evaluating 6-DOF Object Trackers

Authors: Mathieu Garon, Denis Laurendeau, Jean-François Lalonde

Abstract: We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations---notably, unrealistic synthetic data, or real data with large fiducial markers---preventing the community from obtaining an accurate picture of the state-of-the-art. Using a data acquisition pipeline based on a commercial motion capture system for acquir… ▽ More We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations---notably, unrealistic synthetic data, or real data with large fiducial markers---preventing the community from obtaining an accurate picture of the state-of-the-art. Using a data acquisition pipeline based on a commercial motion capture system for acquiring accurate ground truth poses of real objects with respect to a Kinect V2 camera, we build a dataset which contains a total of 297 calibrated sequences. They are acquired in three different scenarios to evaluate the performance of trackers: stability, robustness to occlusion and accuracy during challenging interactions between a person and the object. We conduct an extensive study of a deep 6-DOF tracking architecture and determine a set of optimal parameters. We enhance the architecture and the training methodology to train a 6-DOF tracker that can robustly generalize to objects never seen during training, and demonstrate favorable performance compared to previous approaches trained specifically on the objects to track. △ Less

Submitted 6 September, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: Project website : http://vision.gel.ulaval.ca/~jflalonde/projects/6dofObjectTracking/index.html

arXiv:1712.01259 [pdf, other]

A Perceptual Measure for Deep Single Image Camera Calibration

Authors: Yannick Hold-Geoffroy, Kalyan Sunkavalli, Jonathan Eisenmann, Matt Fisher, Emiliano Gambaretto, Sunil Hadap, Jean-François Lalonde

Abstract: Most current single image camera calibration methods rely on specific image features or user input, and cannot be applied to natural images captured in uncontrolled settings. We propose directly inferring camera calibration parameters from a single image using a deep convolutional neural network. This network is trained using automatically generated samples from a large-scale panorama dataset, and… ▽ More Most current single image camera calibration methods rely on specific image features or user input, and cannot be applied to natural images captured in uncontrolled settings. We propose directly inferring camera calibration parameters from a single image using a deep convolutional neural network. This network is trained using automatically generated samples from a large-scale panorama dataset, and considerably outperforms other methods, including recent deep learning-based approaches, in terms of standard L2 error. However, we argue that in many cases it is more important to consider how humans perceive errors in camera estimation. To this end, we conduct a large-scale human perception study where we ask users to judge the realism of 3D objects composited with and without ground truth camera calibration. Based on this study, we develop a new perceptual measure for camera calibration, and demonstrate that our deep calibration network outperforms other methods on this measure. Finally, we demonstrate the use of our calibration network for a number of applications including virtual object insertion, image retrieval and compositing. △ Less

Submitted 22 April, 2018; v1 submitted 1 December, 2017; originally announced December 2017.

Comments: Published at CVPR'18

arXiv:1704.00090 [pdf, other]

Learning to Predict Indoor Illumination from a Single Image

Authors: Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian Gagné, Jean-François Lalonde

Abstract: We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without stro… ▽ More We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without strong assumptions on scene geometry, material properties, or lighting. We show that this can be accomplished in a three step process: 1) we train a robust lighting classifier to automatically annotate the location of light sources in a large dataset of LDR environment maps, 2) we use these annotations to train a deep neural network that predicts the location of lights in a scene from a single limited field-of-view photo, and 3) we fine-tune this network using a small dataset of HDR environment maps to predict light intensities. This allows us to automatically recover high-quality HDR illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion, we can achieve results that are photo-realistic, which is validated via a perceptual user study. △ Less

Submitted 21 November, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

arXiv:1703.10200 [pdf, other]

Learning High Dynamic Range from Outdoor Panoramas

Authors: **song Zhang, Jean-François Lalonde

Abstract: Outdoor lighting has extremely high dynamic range. This makes the process of capturing outdoor environment maps notoriously challenging since special equipment must be used. In this work, we propose an alternative approach. We first capture lighting with a regular, LDR omnidirectional camera, and aim to recover the HDR after the fact via a novel, learning-based inverse tonemap** method. We propo… ▽ More Outdoor lighting has extremely high dynamic range. This makes the process of capturing outdoor environment maps notoriously challenging since special equipment must be used. In this work, we propose an alternative approach. We first capture lighting with a regular, LDR omnidirectional camera, and aim to recover the HDR after the fact via a novel, learning-based inverse tonemap** method. We propose a deep autoencoder framework which regresses linear, high dynamic range data from non-linear, saturated, low dynamic range panoramas. We validate our method through a wide set of experiments on synthetic data, as well as on a novel dataset of real photographs with ground truth. Our approach finds applications in a variety of settings, ranging from outdoor light capture to image matching. △ Less

Submitted 7 November, 2017; v1 submitted 29 March, 2017; originally announced March 2017.

Comments: 8 pages + 2 pages of citations, 10 figures. Accepted as an oral paper at ICCV 2017

arXiv:1703.09771 [pdf, other]

doi 10.1109/TVCG.2017.2734599

Deep 6-DOF Tracking

Authors: Mathieu Garon, Jean-François Lalonde

Abstract: We present a temporal 6-DOF tracking method which leverages deep learning to achieve state-of-the-art performance on challenging datasets of real world capture. Our method is both more accurate and more robust to occlusions than the existing best performing approaches while maintaining real-time performance. To assess its efficacy, we evaluate our approach on several challenging RGBD sequences of… ▽ More We present a temporal 6-DOF tracking method which leverages deep learning to achieve state-of-the-art performance on challenging datasets of real world capture. Our method is both more accurate and more robust to occlusions than the existing best performing approaches while maintaining real-time performance. To assess its efficacy, we evaluate our approach on several challenging RGBD sequences of real objects in a variety of conditions. Notably, we systematically evaluate robustness to occlusions through a series of sequences where the object to be tracked is increasingly occluded. Finally, our approach is purely data-driven and does not require any hand-designed features: robust tracking is automatically learned from data. △ Less

Submitted 15 August, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

Comments: 9 pages, 9 figures, ISMAR 2017, TVCG special edition Website: http://vision.gel.ulaval.ca/~jflalonde/projects/deepTracking/index.html

Journal ref: IEEE Transactions on Visualization and Computer Graphics 2017

arXiv:1611.06403 [pdf, other]

Deep Outdoor Illumination Estimation

Authors: Yannick Hold-Geoffroy, Kalyan Sunkavalli, Sunil Hadap, Emiliano Gambaretto, Jean-François Lalonde

Abstract: We present a CNN-based technique to estimate high-dynamic range outdoor illumination from a single low dynamic range image. To train the CNN, we leverage a large dataset of outdoor panoramas. We fit a low-dimensional physically-based outdoor illumination model to the skies in these panoramas giving us a compact set of parameters (including sun position, atmospheric conditions, and camera parameter… ▽ More We present a CNN-based technique to estimate high-dynamic range outdoor illumination from a single low dynamic range image. To train the CNN, we leverage a large dataset of outdoor panoramas. We fit a low-dimensional physically-based outdoor illumination model to the skies in these panoramas giving us a compact set of parameters (including sun position, atmospheric conditions, and camera parameters). We extract limited field-of-view images from the panoramas, and train a CNN with this large set of input image--output lighting parameter pairs. Given a test image, this network can be used to infer illumination parameters that can, in turn, be used to reconstruct an outdoor illumination environment map. We demonstrate that our approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image. An extensive evaluation on both the panorama dataset and captured HDR environment maps shows that our technique significantly outperforms previous solutions to this problem. △ Less

Submitted 11 April, 2018; v1 submitted 19 November, 2016; originally announced November 2016.

Comments: CVPR'17 preprint, 8 pages + 2 pages of citations, 12 figures

Showing 1–43 of 43 results for author: Lalonde, J