Skip to main content

Showing 1–35 of 35 results for author: Larlus, D

.
  1. arXiv:2402.17420  [pdf, other

    cs.CV cs.AI

    PANDAS: Prototype-based Novel Class Discovery and Detection

    Authors: Tyler L. Hayes, César R. de Souza, Namil Kim, Jiwon Kim, Riccardo Volpi, Diane Larlus

    Abstract: Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its re… ▽ More

    Submitted 30 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to the Conference on Lifelong Learning Agents (CoLLAs 2024)

  2. arXiv:2402.11305  [pdf, other

    cs.CV

    On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models

    Authors: Juliette Marrie, Michael Arbel, Julien Mairal, Diane Larlus

    Abstract: Large pretrained visual models exhibit remarkable generalization across diverse recognition tasks. Yet, real-world applications often demand compact models tailored to specific problems. Variants of knowledge distillation have been devised for such a purpose, enabling task-specific compact models (the students) to learn from a generic large pretrained one (the teacher). In this paper, we show that… ▽ More

    Submitted 7 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2024

  3. arXiv:2402.09237  [pdf, other

    cs.CV

    Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

    Authors: Yannis Kalantidis, Mert Bülent Sarıyıldız, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

    Abstract: State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes w… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted at ICLR 2024. Project Page: https://europe.naverlabs.com/ret4loc

  4. arXiv:2306.09998  [pdf, other

    cs.CV cs.LG

    SLACK: Stable Learning of Augmentations with Cold-start and KL regularization

    Authors: Juliette Marrie, Michael Arbel, Diane Larlus, Julien Mairal

    Abstract: Data augmentation is known to improve the generalization capabilities of neural networks, provided that the set of transformations is chosen with care, a selection often performed manually. Automatic data augmentation aims at automating this process. However, most recent approaches still rely on some prior information; they start from a small pool of manually-selected default transformations that… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to CVPR 2023

  5. arXiv:2306.08731  [pdf, other

    cs.CV

    EPIC Fields: Marrying 3D Geometry and Video Understanding

    Authors: Vadim Tschernezki, Ahmad Darkhalil, Zhifan Zhu, David Fouhey, Iro Laina, Diane Larlus, Dima Damen, Andrea Vedaldi

    Abstract: Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the c… ▽ More

    Submitted 1 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023. 24 pages, 15 figures. Project Webpage: http://epic-kitchens.github.io/epic-fields

  6. arXiv:2305.19879  [pdf, other

    cs.CV

    RaSP: Relation-aware Semantic Prior for Weakly Supervised Incremental Segmentation

    Authors: Subhankar Roy, Riccardo Volpi, Gabriela Csurka, Diane Larlus

    Abstract: Class-incremental semantic image segmentation assumes multiple model updates, each enriching the model to segment new categories. This is typically carried out by providing expensive pixel-level annotations to the training algorithm for all new objects, limiting the adoption of such methods in practical applications. Approaches that solely require image-level labels offer an attractive alternative… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted to CoLLAs 2023

  7. arXiv:2212.08420  [pdf, other

    cs.CV cs.LG

    Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

    Authors: Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis

    Abstract: Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by investigating the need for real images when training models for ImageNet classification.… ▽ More

    Submitted 28 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023

  8. arXiv:2210.02254  [pdf, other

    cs.CV

    Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

    Authors: Jon Almazán, Byungsoo Ko, Geonmo Gu, Diane Larlus, Yannis Kalantidis

    Abstract: Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very different specialized domains. Additionally, it should be able to benefit from even unlabeled images from th… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  9. arXiv:2209.03494  [pdf, other

    cs.CV cs.GR

    Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

    Authors: Vadim Tschernezki, Iro Laina, Diane Larlus, Andrea Vedaldi

    Abstract: We present Neural Feature Fusion Fields (N3F), a method that improves dense 2D image feature extractors when the latter are applied to the analysis of multiple images reconstructible as a 3D scene. Given an image feature extractor, for example pre-trained using self-supervision, N3F uses it as a teacher to learn a student network defined in 3D space. The 3D student network is similar to a neural r… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: 3DV2022, Oral. Project page: https://www.robots.ox.ac.uk/~vadim/n3f/

  10. arXiv:2206.15369  [pdf, other

    cs.CV cs.LG

    No Reason for No Supervision: Improved Generalization in Supervised Models

    Authors: Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane Larlus

    Abstract: We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at both the training task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model's generalization and maintaining its performance on the original task. Models trained with self-supervi… ▽ More

    Submitted 10 March, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted to ICLR 2023 (spotlight)

  11. arXiv:2203.16195  [pdf, other

    cs.CV

    On the Road to Online Adaptation for Semantic Image Segmentation

    Authors: Riccardo Volpi, Pau de Jorge, Diane Larlus, Gabriela Csurka

    Abstract: We propose a new problem formulation and a corresponding evaluation framework to advance research on unsupervised domain adaptation for semantic image segmentation. The overall goal is fostering the development of adaptive learning systems that will continuously learn, without supervision, in ever-changing environments. Typical protocols that study adaptation algorithms for segmentation models are… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022 (camera ready)

  12. arXiv:2203.08101  [pdf, other

    cs.CV cs.IR

    ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity

    Authors: Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, Diane Larlus

    Abstract: An intuitive way to search for images is to use queries composed of an example image and a complementary text. While the first provides rich and implicit context for the search, the latter explicitly calls for new traits, or specifies how some elements of the example image should be changed to retrieve the desired target image. Current approaches typically combine the features of each of the two e… ▽ More

    Submitted 16 May, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Published in ICLR 2022

  13. arXiv:2201.13182  [pdf, other

    cs.CV

    Learning Super-Features for Image Retrieval

    Authors: Philippe Weinzaepfel, Thomas Lucas, Diane Larlus, Yannis Kalantidis

    Abstract: Methods that combine local and global features have recently shown excellent performance on multiple challenging deep image retrieval benchmarks, but their use of local features raises at least two issues. First, these local features simply boil down to the localized map activations of a neural network, and hence can be extremely redundant. Second, they are typically trained with a global loss tha… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  14. arXiv:2110.12812  [pdf, other

    cs.CV cs.LG

    Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

    Authors: Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

    Abstract: Given a gallery of uncaptioned video sequences, this paper considers the task of retrieving videos based on their relevance to an unseen text query. To compensate for the lack of annotations, we rely instead on a related video gallery composed of video-caption pairs, termed the source gallery, albeit with a domain gap between its videos and those in the target gallery. We thus introduce the proble… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 15 pages

  15. arXiv:2110.09936  [pdf, other

    cs.CV cs.GR

    NeuralDiff: Segmenting 3D objects that move in egocentric videos

    Authors: Vadim Tschernezki, Diane Larlus, Andrea Vedaldi

    Abstract: Given a raw video sequence taken from a freely-moving camera, we study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground containing the objects that move in the video sequence. This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large appar… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 3DV2021. Project page: https://www.robots.ox.ac.uk/~vadim/neuraldiff/

  16. arXiv:2110.09455  [pdf, other

    cs.CV cs.AI cs.LG

    TLDR: Twin Learning for Dimensionality Reduction

    Authors: Yannis Kalantidis, Carlos Lassance, Jon Almazan, Diane Larlus

    Abstract: Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved. Such methods usually require propagation on large k-NN graphs or complicated optimization solvers. On the other hand, self-supervised learning approaches, typically used to learn representations from scrat… ▽ More

    Submitted 15 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR). Code available at: https://github.com/naver/tldr

  17. arXiv:2101.05068  [pdf, other

    cs.CV

    Probabilistic Embeddings for Cross-Modal Retrieval

    Authors: Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus

    Abstract: Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, w… ▽ More

    Submitted 14 June, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: Accepted to CVPR 2021; Code is available at https://github.com/naver-ai/pcme

  18. arXiv:2012.05649  [pdf, other

    cs.CV cs.LG

    Concept Generalization in Visual Representation Learning

    Authors: Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek Alahari

    Abstract: Measuring concept generalization, i.e., the extent to which models trained on a set of (seen) visual concepts can be leveraged to recognize a new set of (unseen) concepts, is a popular way of evaluating visual representations, especially in a self-supervised learning framework. Nonetheless, the choice of unseen concepts for such an evaluation is usually made arbitrarily, and independently from the… ▽ More

    Submitted 10 September, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Accepted to ICCV 2021. See our project website: https://europe.naverlabs.com/cog-benchmark for code and ImageNet-CoG level files

  19. arXiv:2012.04329  [pdf, other

    cs.CV

    StacMR: Scene-Text Aware Cross-Modal Retrieval

    Authors: Andrés Mafla, Rafael Sampaio de Rezende, Lluís Gómez, Diane Larlus, Dimosthenis Karatzas

    Abstract: Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in imag… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  20. arXiv:2012.04324  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning

    Authors: Riccardo Volpi, Diane Larlus, Grégory Rogez

    Abstract: Most standard learning approaches lead to fragile models which are prone to drift when sequentially trained on samples of a different nature - the well-known "catastrophic forgetting" issue. In particular, when a model consecutively learns from different visual domains, it tends to forget the past domains in favor of the most recent ones. In this context, we show that one way to learn models that… ▽ More

    Submitted 8 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR 2021

  21. arXiv:2010.01028  [pdf, other

    cs.CV cs.LG

    Hard Negative Mixing for Contrastive Learning

    Authors: Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, Diane Larlus

    Abstract: Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucia… ▽ More

    Submitted 4 December, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted at NeurIPS 2020. Project page with pretrained models: https://europe.naverlabs.com/mochi

  22. arXiv:2008.01392  [pdf, other

    cs.CV

    Learning Visual Representations with Caption Annotations

    Authors: Mert Bulent Sariyildiz, Julien Perez, Diane Larlus

    Abstract: Pretraining general-purpose visual features has become a crucial part of tackling many computer vision tasks. While one can learn such features on the extensively-annotated ImageNet dataset, recent approaches have looked at ways to allow for noisy, fewer, or even no annotations to perform such pretraining. Starting from the observation that captioned images are easily crawlable, we argue that this… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted to the 2020 European Conference on Computer Vision

  23. arXiv:1908.03477  [pdf, other

    cs.CV

    Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

    Authors: Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

    Abstract: We address the problem of cross-modal fine-grained action retrieval between text and video. Cross-modal retrieval is commonly achieved through learning a shared embedding space, that can indifferently embed modalities. In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions. We build a separate multi-modal embedding space for each PoS t… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.

    Comments: Accepted for presentation at ICCV. Project Page: https://mwray.github.io/FGAR

  24. arXiv:1807.10712  [pdf, other

    cs.CV

    Semi-convolutional Operators for Instance Segmentation

    Authors: David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi

    Abstract: Object detection and instance segmentation are dominated by region-based methods such as Mask RCNN. However, there is a growing interest in reducing these problems to pixel labeling tasks, as the latter could be more efficient, could be integrated seamlessly in image-to-image network architectures as used in many other tasks, and could be more accurate for objects that are not well approximated by… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: Accepted as a conference paper at ECCV 2018

  25. arXiv:1804.01552  [pdf, other

    cs.CV

    Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection

    Authors: David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi

    Abstract: Self-supervision can dramatically cut back the amount of manually-labelled data required to train deep neural networks. While self-supervision has usually been considered for tasks such as image classification, in this paper we aim at extending it to geometry-oriented tasks such as semantic matching and part detection. We do so by building on several recent ideas in unsupervised landmark detection… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)

  26. arXiv:1801.05339  [pdf, other

    cs.CV

    Re-ID done right: towards good practices for person re-identification

    Authors: Jon Almazan, Bojana Gajic, Naila Murray, Diane Larlus

    Abstract: Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approa… ▽ More

    Submitted 16 January, 2018; originally announced January 2018.

  27. arXiv:1705.03951  [pdf, other

    cs.CV

    Learning 3D Object Categories by Looking Around Them

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: Traditional approaches for learning 3D object categories use either synthetic data or manual supervision. In this paper, we propose a method which does not require manual annotations and is instead cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly c… ▽ More

    Submitted 2 December, 2021; v1 submitted 10 May, 2017; originally announced May 2017.

    Comments: Proceedings of the International Conference on Computer Vision, 2017

  28. arXiv:1704.04749  [pdf, other

    cs.CV

    AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we p… ▽ More

    Submitted 16 April, 2017; originally announced April 2017.

    Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

  29. arXiv:1610.07940  [pdf, other

    cs.CV

    End-to-end Learning of Deep Visual Representations for Image Retrieval

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: i) noisy training data, ii) inappropriate deep architecture, and iii) suboptimal trai… ▽ More

    Submitted 5 May, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: Accepted for publication at the International Journal of Computer Vision (IJCV). Extended version of our ECCV2016 paper "Deep Image Retrieval: Learning global representations for image search"

  30. arXiv:1607.01205  [pdf, other

    cs.CV

    Learning the semantic structure of objects from Web supervision

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important. Recognizing object parts and attributes has been extensively studied before, yet learning large space of such concepts remains elusive due to the high cost of providing detailed object annotations for supervision. The key contribution of th… ▽ More

    Submitted 2 December, 2021; v1 submitted 5 July, 2016; originally announced July 2016.

  31. arXiv:1604.01325  [pdf, other

    cs.CV

    Deep Image Retrieval: Learning global representations for image search

    Authors: Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus

    Abstract: We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the specific task of image retrieval. Our contribution is tw… ▽ More

    Submitted 28 July, 2016; v1 submitted 5 April, 2016; originally announced April 2016.

    Comments: ECCV 2016 version + additional results

  32. arXiv:1603.01076  [pdf, other

    cs.CV

    What is the right way to represent document images?

    Authors: Gabriela Csurka, Diane Larlus, Albert Gordo, Jon Almazan

    Abstract: In this article we study the problem of document image representation based on visual features. We propose a comprehensive experimental study that compares three types of visual document image representations: (1) traditional so-called shallow features, such as the RunLength and the Fisher-Vector descriptors, (2) deep features based on Convolutional Neural Networks, and (3) features extracted from… ▽ More

    Submitted 2 December, 2016; v1 submitted 3 March, 2016; originally announced March 2016.

  33. arXiv:1504.04763  [pdf, other

    cs.CV

    Understanding the Fisher Vector: a multimodal part model

    Authors: David Novotný, Diane Larlus, Florent Perronnin, Andrea Vedaldi

    Abstract: Fisher Vectors and related orderless visual statistics have demonstrated excellent performance in object detection, sometimes superior to established approaches such as the Deformable Part Models. However, it remains unclear how these models can capture complex appearance variations using visual codebooks of limited sizes and coarse geometric information. In this work, we propose to interpret Fish… ▽ More

    Submitted 18 April, 2015; originally announced April 2015.

  34. arXiv:1408.4325  [pdf, other

    cs.CV

    What makes an Image Iconic? A Fine-Grained Case Study

    Authors: Yangmuzi Zhang, Diane Larlus, Florent Perronnin

    Abstract: A natural approach to teaching a visual concept, e.g. a bird species, is to show relevant images. However, not all relevant images represent a concept equally well. In other words, they are not necessarily iconic. This observation raises three questions. Is iconicity a subjective property? If not, can we predict iconicity? And what exactly makes an image iconic? We provide answers to these questio… ▽ More

    Submitted 19 August, 2014; originally announced August 2014.

  35. arXiv:1406.6147  [pdf, other

    cs.CV

    Incorporating Near-Infrared Information into Semantic Image Segmentation

    Authors: Neda Salamati, Diane Larlus, Gabriela Csurka, Sabine Süsstrunk

    Abstract: Recent progress in computational photography has shown that we can acquire near-infrared (NIR) information in addition to the normal visible (RGB) band, with only slight modifications to standard digital cameras. Due to the proximity of the NIR band to visible radiation, NIR images share many properties with visible images. However, as a result of the material dependent reflection in the NIR part… ▽ More

    Submitted 24 June, 2014; originally announced June 2014.