Skip to main content

Showing 1–50 of 51 results for author: Lathuilière, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15633  [pdf, other

    cs.CV cs.AI

    Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning

    Authors: Thomas De Min, Massimiliano Mancini, Stéphane Lathuilière, Subhankar Roy, Elisa Ricci

    Abstract: Prompt tuning has emerged as an effective rehearsal-free technique for class-incremental learning (CIL) that learns a tiny set of task-specific parameters (or prompts) to instruct a pre-trained transformer to learn on a sequence of tasks. Albeit effective, prompt tuning methods do not lend well in the multi-label class incremental learning (MLCIL) scenario (where an image contains multiple foregro… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Published at 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024

  2. arXiv:2402.15206  [pdf, other

    cs.CV

    Source-Guided Similarity Preservation for Online Person Re-Identification

    Authors: Hamza Rami, Jhony H. Giraldo, Nicolas Winckler, Stéphane Lathuilière

    Abstract: Online Unsupervised Domain Adaptation (OUDA) for person Re-Identification (Re-ID) is the task of continuously adapting a model trained on a well-annotated source domain dataset to a target domain observed as a data stream. In OUDA, person Re-ID models face two main challenges: catastrophic forgetting and domain shift. In this work, we propose a new Source-guided Similarity Preservation (S2P) frame… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: WACV 2024

  3. arXiv:2312.09788  [pdf, other

    cs.CV cs.AI cs.LG

    Collaborating Foundation Models for Domain Generalized Semantic Segmentation

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Domain Generalized Semantic Segmentation (DGSS) deals with training a model on a labeled source domain with the aim of generalizing to unseen domains during inference. Existing DGSS methods typically effectuate robust features by means of Domain Randomization (DR). Such an approach is often limited as it can only account for style diversification and not content. In this work, we take an orthogona… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: https://github.com/yasserben/CLOUDS ; Accepted to CVPR 2024

  4. arXiv:2312.08977  [pdf, other

    cs.LG cs.AI cs.CV

    Weighted Ensemble Models Are Strong Continual Learners

    Authors: Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, Stéphane Lathuilière

    Abstract: In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stab… ▽ More

    Submitted 21 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Code: https://github.com/IemProg/CoFiMA

  5. arXiv:2311.03873  [pdf, other

    cs.CV cs.AI

    Mini but Mighty: Finetuning ViTs with Mini Adapters

    Authors: Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière

    Abstract: Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimen… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: WACV2024

  6. arXiv:2310.11482  [pdf, other

    cs.CV cs.AI

    Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation

    Authors: Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, Stéphane Lathuilière

    Abstract: Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks without forgetting previously learned information. The advent of large pre-trained models (PTMs) has fast-tracked the progress in CIL due to the highly transferable PTM representations, where tuning a small set of parameters leads to state-of-the-art performance when comp… ▽ More

    Submitted 14 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 8 pages,5 figures

  7. arXiv:2310.10325  [pdf, other

    cs.CV eess.IV

    Towards image compression with perfect realism at ultra-low bitrates

    Authors: Marlène Careil, Matthew J. Muckley, Jakob Verbeek, Stéphane Lathuilière

    Abstract: Image codecs are typically optimized to trade-off bitrate \vs distortion metrics. At low bitrates, this leads to compression artefacts which are easily perceptible, even when training with perceptual or adversarial losses. To improve image quality and remove dependency on the bitrate, we propose to decode with iterative diffusion models. We condition the decoding process on a vector-quantized imag… ▽ More

    Submitted 19 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  8. arXiv:2309.11321  [pdf, other

    cs.CV

    Face Aging via Diffusion-based Editing

    Authors: Xiangyi Chen, Stéphane Lathuilière

    Abstract: In this paper, we address the problem of face aging: generating past or future facial images by incorporating age-related changes to the given face. Previous aging methods rely solely on human facial image datasets and are thus constrained by their inherent scale and bias. This restricts their application to a limited generatable age range and the inability to handle large age gaps. We propose FAD… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: accepted at BMVC 2023

  9. arXiv:2308.09139  [pdf, other

    cs.CV

    The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

    Authors: Giacomo Zara, Alessandro Conti, Subhankar Roy, Stéphane Lathuilière, Paolo Rota, Elisa Ricci

    Abstract: Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this wo… ▽ More

    Submitted 22 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV2023, 14 pages, 7 figures, code is available at https://github.com/giaczara/dallv

  10. arXiv:2307.04187  [pdf, other

    cs.CV cs.MM

    Predictive Coding For Animation-Based Video Compression

    Authors: Goluck Konuko, Stéphane Lathuilière, Giuseppe Valenzise

    Abstract: We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference fr… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: Accepted paper: ICIP 2023

  11. arXiv:2306.13754  [pdf, other

    cs.CV

    Zero-shot spatial layout conditioning for text-to-image diffusion models

    Authors: Guillaume Couairon, Marlène Careil, Matthieu Cord, Stéphane Lathuilière, Jakob Verbeek

    Abstract: Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling and allow for an intuitive and powerful user interface to drive the image generation process. Expressing spatial constraints, e.g. to position specific objects in particular locations, is cumbersome using text; and current text-based image generation models are not able to accu… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  12. arXiv:2306.09098  [pdf, other

    cs.CV

    Contrast, Stylize and Adapt: Unsupervised Contrastive Learning Framework for Domain Adaptive Semantic Segmentation

    Authors: Tianyu Li, Subhankar Roy, Huayi Zhou, Hongtao Lu, Stephane Lathuiliere

    Abstract: To overcome the domain gap between synthetic and real-world datasets, unsupervised domain adaptation methods have been proposed for semantic segmentation. Majority of the previous approaches have attempted to reduce the gap either at the pixel or feature level, disregarding the fact that the two components interact positively. To address this, we present CONtrastive FEaTure and pIxel alignment (CO… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPRW 2023

  13. arXiv:2304.03766  [pdf, other

    cs.CV

    Test your samples jointly: Pseudo-reference for image quality evaluation

    Authors: Marcelin Tworski, Stéphane Lathuilière

    Abstract: In this paper, we address the well-known image quality assessment problem but in contrast from existing approaches that predict image quality independently for every images, we propose to jointly model different images depicting the same content to improve the precision of quality estimation. This proposal is motivated by the idea that multiple distorted images can provide information to disambigu… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  14. arXiv:2304.02321  [pdf, other

    cs.CV

    Few-shot Semantic Image Synthesis with Class Affinity Transfer

    Authors: Marlène Careil, Jakob Verbeek, Stéphane Lathuilière

    Abstract: Semantic image synthesis aims to generate photo realistic images given a semantic segmentation map. Despite much recent progress, training them still requires large datasets of images annotated with per-pixel label maps that are extremely tedious to obtain. To alleviate the high annotation cost, we propose a transfer method that leverages a model trained on a large source dataset to improve the le… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  15. arXiv:2303.18080  [pdf, other

    cs.CV

    One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models

    Authors: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière

    Abstract: Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the ap… ▽ More

    Submitted 16 June, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition- Workshop on Generative Models for Computer Vision (CVPR-W 2023)

  16. arXiv:2303.13472  [pdf, other

    cs.CV cs.AI

    Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

    Authors: Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

    Abstract: Neural video game simulators emerged as powerful tools to generate and edit videos. Their idea is to represent games as the evolution of an environment's state driven by the actions of its agents. While such a paradigm enables users to play a game action-by-action, its rigidity precludes more semantic forms of control. To overcome this limitation, we augment game models with prompts specified as a… ▽ More

    Submitted 21 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ACM Transactions on Graphics \c{opyright} Copyright is held by the owner/author(s) 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Graphics, http://dx.doi.org/10.1145/3635705

  17. arXiv:2211.14105  [pdf, other

    cs.CV

    Unifying conditional and unconditional semantic image synthesis with OCO-GAN

    Authors: Marlène Careil, Stéphane Lathuilière, Camille Couprie, Jakob Verbeek

    Abstract: Generative image models have been extensively studied in recent years. In the unconditional setting, they model the marginal distribution from unlabelled images. To allow for more control, image synthesis can be conditioned on semantic segmentation maps that instruct the generator the position of objects in the image. While these two tasks are intimately related, they are generally studied in isol… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  18. arXiv:2211.00987  [pdf, other

    cs.CV

    Autoregressive GAN for Semantic Unconditional Head Motion Generation

    Authors: Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, Dominique Vaufreydaz

    Abstract: In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose. Different from traditional audio-conditioned talking head generation that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration wh… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  19. arXiv:2210.01578  [pdf, other

    cs.CV

    Cooperative Self-Training for Multi-Target Adaptive Semantic Segmentation

    Authors: Yangsong Zhang, Subhankar Roy, Hongtao Lu, Elisa Ricci, Stéphane Lathuilière

    Abstract: In this work we address multi-target domain adaptation (MTDA) in semantic segmentation, which consists in adapting a single model from an annotated source dataset to multiple unannotated target datasets that differ in their underlying data distributions. To address MTDA, we propose a self-training strategy that employs pseudo-labels to induce cooperation among multiple domain-specific classifiers.… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at WACV 2023

  20. arXiv:2207.13530  [pdf, other

    cs.MM eess.IV

    A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing

    Authors: Goluck Konuko, Stéphane Lathuilière, Giuseppe Valenzise

    Abstract: Deep generative models, and particularly facial animation schemes, can be used in video conferencing applications to efficiently compress a video through a sparse set of keypoints, without the need to transmit dense motion vectors. While these schemes bring significant coding gains over conventional video codecs at low bitrates, their performance saturates quickly when the available bandwidth incr… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Preprint paper. Accepted for publication at ICIP 2022

  21. arXiv:2207.11025  [pdf, other

    cs.CV cs.LG

    Custom Structure Preservation in Face Aging

    Authors: Guillermo Gomez-Trenado, Stéphane Lathuilière, Pablo Mesejo, Óscar Cordón

    Abstract: In this work, we propose a novel architecture for face age editing that can produce structural modifications while maintaining relevant details present in the original image. We disentangle the style and content of the input image and propose a new decoder network that adopts a style-based strategy to combine the style and content representations of the input image while conditioning the output on… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: 36 pages, 21 figures

  22. arXiv:2207.09763  [pdf, other

    cs.CV cs.AI cs.LG

    GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation

    Authors: Cristiano Saltori, Evgeny Krivosheev, Stéphane Lathuilière, Nicu Sebe, Fabio Galasso, Giuseppe Fiameni, Elisa Ricci, Fabio Poiesi

    Abstract: 3D point cloud semantic segmentation is fundamental for autonomous driving. Most approaches in the literature neglect an important aspect, i.e., how to deal with domain shift when handling dynamic scenes. This can significantly hinder the navigation capabilities of self-driving vehicles. This paper advances the state of the art in this research field. Our first contribution consists in analysing a… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022

  23. arXiv:2205.04383  [pdf, other

    cs.CV

    Online Unsupervised Domain Adaptation for Person Re-identification

    Authors: Hamza Rami, Matthieu Ospici, Stéphane Lathuilière

    Abstract: Unsupervised domain adaptation for person re-identification (Person Re-ID) is the task of transferring the learned knowledge on the labeled source domain to the unlabeled target domain. Most of the recent papers that address this problem adopt an offline training setting. More precisely, the training of the Re-ID model is done assuming that we have access to the complete training target domain dat… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: To appear in the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR-W) on Continual Learning in Computer Vision (CLVision) 2022

  24. arXiv:2203.01914  [pdf, other

    cs.CV cs.AI

    Playable Environments: Video Manipulation in Space and Time

    Authors: Willi Menapace, Stéphane Lathuilière, Aliaksandr Siarohin, Christian Theobalt, Sergey Tulyakov, Vladislav Golyanik, Elisa Ricci

    Abstract: We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions. The actions are learnt in an unsupervised manner. The camera can be controlled to get the desired viewpoint.… ▽ More

    Submitted 15 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  25. arXiv:2108.08815  [pdf, other

    cs.CV cs.AI

    Click to Move: Controlling Video Generation with Sparse Motion

    Authors: Pierfrancesco Ardino, Marco De Nadai, Bruno Lepri, Elisa Ricci, Stéphane Lathuilière

    Abstract: This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: Accepted by International Conference on Computer Vision (ICCV 2021)

  26. arXiv:2108.08536  [pdf, other

    cs.CV cs.LG

    A Unified Objective for Novel Class Discovery

    Authors: Enrico Fini, Enver Sangineto, Stéphane Lathuilière, Zhun Zhong, Moin Nabi, Elisa Ricci

    Abstract: In this paper, we study the problem of Novel Class Discovery (NCD). NCD aims at inferring novel object categories in an unlabeled set by leveraging from prior knowledge of a labeled set containing different, but related classes. Existing approaches tackle this problem by considering multiple objective functions, usually involving specialized loss terms for the labeled and the unlabeled samples res… ▽ More

    Submitted 29 September, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 (Oral)

  27. HEMP: High-order Entropy Minimization for neural network comPression

    Authors: Enzo Tartaglione, Stéphane Lathuilière, Attilio Fiandrotti, Marco Cagnazzo, Marco Grangetto

    Abstract: We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  28. arXiv:2101.12195  [pdf, other

    cs.CV cs.AI

    Playable Video Generation

    Authors: Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

    Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a nov… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  29. arXiv:2101.03806  [pdf, other

    cs.CV

    Multi-Domain Image-to-Image Translation with Adaptive Inference Graph

    Authors: The-Phuc Nguyen, Stéphane Lathuilière, Elisa Ricci

    Abstract: In this work, we address the problem of multi-domain image-to-image translation with particular attention paid to computational cost. In particular, current state of the art models require a large and deep model in order to handle the visual diversity of multiple domains. In a context of limited computational resources, increasing the network size may not be possible. Therefore, we propose to incr… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted at ICPR 2020

  30. arXiv:2012.00346  [pdf, other

    cs.CV cs.MM

    Ultra-low bitrate video conferencing using deep image animation

    Authors: Goluck Konuko, Giuseppe Valenzise, Stéphane Lathuilière

    Abstract: In this work we propose a novel deep learning approach for ultra-low bitrate video compression for video conferencing applications. To address the shortcomings of current video compression paradigms when the available bandwidth is extremely limited, we adopt a model-based approach that employs deep neural networks to encode motion information as keypoint displacement and reconstruct the video sign… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 5 pages

  31. arXiv:2010.08243  [pdf, other

    cs.CV

    SF-UDA$^{3D}$: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection

    Authors: Cristiano Saltori, Stéphane Lathuiliére, Nicu Sebe, Elisa Ricci, Fabio Galasso

    Abstract: 3D object detectors based only on LiDAR point clouds hold the state-of-the-art on modern street-view benchmarks. However, LiDAR-based detectors poorly generalize across domains due to domain shift. In the case of LiDAR, in fact, domain shift is not only due to changes in the environment and in the object appearances, as for visual data from RGB cameras, but is also related to the geometry of the p… ▽ More

    Submitted 19 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: Accepted paper at 3DV 2020

  32. arXiv:2009.11204  [pdf, other

    cs.CV

    Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

    Authors: Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud

    Abstract: Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used… ▽ More

    Submitted 16 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: International Conference on Pattern Recognition, Milan, Italy, January 2021

  33. arXiv:2009.09981  [pdf, other

    cs.CV

    DR2S : Deep Regression with Region Selection for Camera Quality Evaluation

    Authors: Marcelin Tworski, Stéphane Lathuilière, Salim Belkarfa, Attilio Fiandrotti, Marco Cagnazzo

    Abstract: In this work, we tackle the problem of estimating a camera capability to preserve fine texture details at a given lighting condition. Importantly, our texture preservation measurement should coincide with human perception. Consequently, we formulate our problem as a regression one and we introduce a deep convolutional network to estimate texture quality score. At training time, we use ground-truth… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  34. arXiv:2008.04646  [pdf, other

    cs.CV cs.LG

    Learning to Cluster under Domain Shift

    Authors: Willi Menapace, Stéphane Lathuilière, Elisa Ricci

    Abstract: While unsupervised domain adaptation methods based on deep architectures have achieved remarkable success in many computer vision tasks, they rely on a strong assumption, i.e. labeled source data must be available. In this work we overcome this assumption and we address the problem of transferring knowledge from a source to a target domain when both source and target data have no annotations. Insp… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  35. arXiv:2008.01510  [pdf, other

    cs.CV cs.LG

    Online Continual Learning under Extreme Memory Constraints

    Authors: Enrico Fini, Stéphane Lathuilière, Enver Sangineto, Moin Nabi, Elisa Ricci

    Abstract: Continual Learning (CL) aims to develop agents emulating the human ability to sequentially learn new tasks while being able to retain knowledge obtained from past experiences. In this paper, we introduce the novel problem of Memory-Constrained Online Continual Learning (MC-OCL) which imposes strict constraints on the memory overhead that a possible algorithm can use to avoid catastrophic forgettin… ▽ More

    Submitted 12 January, 2022; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  36. arXiv:2004.03234  [pdf, other

    cs.CV

    Motion-supervised Co-Part Segmentation

    Authors: Aliaksandr Siarohin, Subhankar Roy, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: Recent co-part segmentation methods mostly operate in a supervised learning setting, which requires a large amount of annotated data for training. To overcome this limitation, we propose a self-supervised deep learning method for co-part segmentation. Differently from previous works, our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful… ▽ More

    Submitted 15 April, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Journal ref: ICPR 2021

  37. arXiv:2003.00196  [pdf, other

    cs.CV cs.AI

    First Order Motion Model for Image Animation

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to… ▽ More

    Submitted 1 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: NeurIPS 2019

  38. arXiv:1909.07667  [pdf, other

    cs.CV

    Progressive Fusion for Unsupervised Binocular Depth Estimation using Cycled Networks

    Authors: Andrea Pilzer, Stéphane Lathuilière, Dan Xu, Mihai Marian Puscas, Elisa Ricci, Nicu Sebe

    Abstract: Recent deep monocular depth estimation approaches based on supervised regression have achieved remarkable performance. However, they require costly ground truth annotations during training. To cope with this issue, in this paper we present a novel unsupervised deep learning approach for predicting depth maps. We introduce a new network architecture, named Progressive Fusion Network (PFN), that is… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: Accepted to TPAMI (SI RGB-D Vision), code https://github.com/andrea-pilzer/PFN-depth

  39. Budget-Aware Adapters for Multi-Domain Learning

    Authors: Rodrigo Berriel, Stéphane Lathuilière, Moin Nabi, Tassilo Klein, Thiago Oliveira-Santos, Nicu Sebe, Elisa Ricci

    Abstract: Multi-Domain Learning (MDL) refers to the problem of learning a set of models derived from a common deep architecture, each one specialized to perform a task in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL with a particular interest in obtaining domain-specific models with an adjustable budget in terms of the number of network parameters and computational complexity… ▽ More

    Submitted 8 December, 2020; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: ICCV 2019

  40. arXiv:1905.02655  [pdf, other

    cs.CV

    Attention-based Fusion for Multi-source Human Image Generation

    Authors: Stéphane Lathuilière, Enver Sangineto, Aliaksandr Siarohin, Nicu Sebe

    Abstract: We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target pose and a set X of source appearance images. In this way, we can exploit multiple, possibly complementary images of the same person which are usually available at training and at testing time. The solution we propose is mainly based on a local attention mechanism which sele… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: 10 pages

  41. arXiv:1905.00007  [pdf, other

    cs.CV

    Appearance and Pose-Conditioned Human Image Generation using Deformable GANs

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Enver Sangineto, Nicu Sebe

    Abstract: In this paper, we address the problem of generating person images conditioned on both pose and appearance information. Specifically, given an image xa of a person and a target pose P(xb), extracted from a different image xb, we synthesize a new image of that person in pose P(xb), while preserving the visual details in xa. In order to deal with pixel-to-pixel misalignments caused by the pose differ… ▽ More

    Submitted 14 October, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

    Comments: To appear on IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1801.00055

  42. arXiv:1904.08462  [pdf, other

    cs.CV

    Online Adaptation through Meta-Learning for Stereo Depth Estimation

    Authors: Zhenyu Zhang, Stéphane Lathuilière, Andrea Pilzer, Nicu Sebe, Elisa Ricci, Jian Yang

    Abstract: In this work, we tackle the problem of online adaptation for stereo depth estimation, that consists in continuously adapting a deep network to a target video recordedin an environment different from that of the source training set. To address this problem, we propose a novel Online Meta-Learning model with Adaption (OMLA). Our proposal is based on two main contributions. First, to reducethe domain… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

    Comments: 12 pages

  43. arXiv:1904.01308  [pdf, other

    cs.CV

    CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification

    Authors: Guillaume Delorme, Yihong Xu, Stephane Lathuilière, Radu Horaud, Xavier Alameda-Pineda

    Abstract: Unsupervised person re-ID is the task of identifying people on a target data set for which the ID labels are unavailable during training. In this paper, we propose to unify two trends in unsupervised person re-ID: clustering & fine-tuning and adversarial learning. On one side, clustering groups training images into pseudo-ID labels, and uses them to fine-tune the feature extractor. On the other si… ▽ More

    Submitted 28 April, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

  44. arXiv:1903.04202  [pdf, other

    cs.CV

    Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation

    Authors: Andrea Pilzer, Stéphane Lathuilière, Nicu Sebe, Elisa Ricci

    Abstract: Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting groun… ▽ More

    Submitted 20 April, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted at CVPR2019

  45. arXiv:1902.10953  [pdf, other

    cs.CV

    Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View

    Authors: Benoit Massé, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud

    Abstract: In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial r… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

    Comments: FG 2019

  46. arXiv:1812.08861  [pdf, other

    cs.GR cs.CV cs.LG stat.ML

    Animating Arbitrary Objects via Deep Motion Transfer

    Authors: Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

    Abstract: This paper introduces a novel deep learning framework for image animation. Given an input image with a target object and a driving video sequence depicting a moving object, our framework generates a video in which the target object is animated according to the driving sequence. This is achieved through a deep architecture that decouples appearance and motion information. Our framework consists of… ▽ More

    Submitted 30 August, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: CVPR-2019 (oral)

  47. arXiv:1808.09211  [pdf, other

    cs.CV

    DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model

    Authors: Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

    Abstract: In this paper, we address the problem of how to robustly train a ConvNet for regression, or deep robust regression. Traditionally, deep regression employs the L2 loss function, known to be sensitive to outliers, i.e. samples that either lie at an abnormal distance away from the majority of the training samples, or that correspond to wrongly annotated targets. This means that, during back-propagati… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Comments: accepted at ECCV 2018

  48. A Comprehensive Analysis of Deep Regression

    Authors: Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

    Abstract: Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimizatio… ▽ More

    Submitted 24 September, 2020; v1 submitted 22 March, 2018; originally announced March 2018.

    Comments: Published in IEEE TPAMI

    Journal ref: IEEE TPAMI Volume: 42 , Issue: 9 , Sept. 1 2020

  49. arXiv:1801.00055  [pdf, other

    cs.CV

    Deformable GANs for Pose-based Human Image Generation

    Authors: Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, Nicu Sebe

    Abstract: In this paper we address the problem of generating person images conditioned on a given pose. Specifically, given an image of a person and a target pose, we synthesize a new image of that person in the novel pose. In order to deal with pixel-to-pixel misalignments caused by the pose differences, we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreov… ▽ More

    Submitted 6 April, 2018; v1 submitted 29 December, 2017; originally announced January 2018.

    Comments: CVPR 2018 version

  50. Neural Network Based Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction

    Authors: Stéphane Lathuilière, Benoit Massé, Pablo Mesejo, Radu Horaud

    Abstract: This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision. The robot learns to focus its attention onto groups of people from its own audio-visual experiences, independently of t… ▽ More

    Submitted 23 April, 2018; v1 submitted 18 November, 2017; originally announced November 2017.

    Comments: Paper submitted to Pattern Recognition Letters

    Journal ref: Pattern Recognition Letters, vol. 118, 2019, 61-71