Skip to main content

Showing 1–15 of 15 results for author: Siméoni, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08113  [pdf, other

    cs.CV cs.RO

    Valeo4Cast: A Modular Approach to End-to-End Forecasting

    Authors: Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord

    Abstract: Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the curren… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Winning solution of the Argoverse 2 "Unified Detection, Tracking, and Forecasting" challenge, held at CVPR 2024 WAD

  2. arXiv:2401.09413  [pdf, other

    cs.CV

    POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

    Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: accepted to NeurIPS 2023

  3. arXiv:2312.12359  [pdf, other

    cs.CV

    CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation

    Authors: Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez

    Abstract: The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless interaction with arbitrary text prompts. However, its lack of spatial awareness makes it unsuitable for dense computer vision tasks, e.g., semantic segmentation, without an additional fine-tuning step that often uses annotations and can potentially suppress its original open-vocabulary properties. Meanwhile, s… ▽ More

    Submitted 27 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  4. arXiv:2310.17504  [pdf, other

    cs.CV

    Three Pillars improving Vision Foundation Model Distillation for Lidar

    Authors: Gilles Puy, Spyros Gidaris, Alexandre Boulch, Oriane Siméoni, Corentin Sautier, Patrick Pérez, Andrei Bursuc, Renaud Marlet

    Abstract: Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show prom… ▽ More

    Submitted 19 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: The code is available at https://github.com/valeoai/ScaLR

  5. arXiv:2310.12904  [pdf, other

    cs.CV

    Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

    Authors: Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, Patrick Pérez

    Abstract: The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about the… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  6. arXiv:2309.14289  [pdf, other

    cs.CV

    CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free

    Authors: Monika Wysoczańska, Michaël Ramamonjisoa, Tomasz Trzciński, Oriane Siméoni

    Abstract: The emergence of CLIP has opened the way for open-world image perception. The zero-shot classification capabilities of the model are impressive but are harder to use for dense tasks such as image segmentation. Several methods have proposed different modifications and learning schemes to produce dense output. Instead, we propose in this work an open-vocabulary semantic segmentation method, dubbed C… ▽ More

    Submitted 28 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted to WACV 2024

  7. arXiv:2307.09361  [pdf, other

    cs.CV cs.AI cs.LG

    MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

    Authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez

    Abstract: Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  8. arXiv:2304.11762  [pdf, other

    cs.CV

    You Never Get a Second Chance To Make a Good First Impression: Seeding Active Learning for 3D Semantic Segmentation

    Authors: Nermin Samet, Oriane Siméoni, Gilles Puy, Georgy Ponimatkin, Renaud Marlet, Vincent Lepetit

    Abstract: We propose SeedAL, a method to seed active learning for efficient annotation of 3D point clouds for semantic segmentation. Active Learning (AL) iteratively selects relevant data fractions to annotate within a given budget, but requires a first fraction of the dataset (a 'seed') to be already annotated to estimate the benefit of annotating other data fractions. We first show that the choice of the… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: ICCV 2023

  9. arXiv:2212.07834  [pdf, other

    cs.CV

    Unsupervised Object Localization: Observing the Background to Discover Objects

    Authors: Oriane Siméoni, Chloé Sekkat, Gilles Puy, Antonin Vobecky, Éloi Zablocki, Patrick Pérez

    Abstract: Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depen… ▽ More

    Submitted 29 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  10. arXiv:2207.12112  [pdf, other

    cs.CV

    Active Learning Strategies for Weakly-supervised Object Detection

    Authors: Huy V. Vo, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Jean Ponce

    Abstract: Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2022. Contains 27 pages, 9 tables and 6 figures

  11. arXiv:2203.11160  [pdf, other

    cs.CV

    Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

    Authors: Antonin Vobecky, David Hurych, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars which, equipped with cameras and LiDAR sensors, drive around a city. Our contributions are threefold. First, we propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronize… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: v2: improved quality of images. See the project webpage https://vobecant.github.io/DriveAndSegment/ for the code and more

  12. arXiv:2109.14279  [pdf, other

    cs.CV

    Localizing Objects with Self-Supervised Transformers and no Labels

    Authors: Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, Jean Ponce

    Abstract: Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image.… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Journal ref: BMVC 2021

  13. arXiv:1911.08177  [pdf, ps, other

    cs.CV cs.LG

    Rethinking deep active learning: Using unlabeled data at model training

    Authors: Oriane Siméoni, Mateusz Budnik, Yannis Avrithis, Guillaume Gravier

    Abstract: Active learning typically focuses on training a model on few labeled examples alone, while unlabeled ones are only used for acquisition. In this work we depart from this setting by using both labeled and unlabeled data during model training across active learning cycles. We do so by using unsupervised feature learning at the beginning of the active learning pipeline and semi-supervised learning at… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

  14. arXiv:1905.06358  [pdf, other

    cs.CV

    Local Features and Visual Words Emerge in Activations

    Authors: Oriane Siméoni, Yannis Avrithis, Ondrej Chum

    Abstract: We propose a novel method of deep spatial matching (DSM) for image retrieval. Initial ranking is based on image descriptors extracted from convolutional neural network activations by global pooling, as in recent state-of-the-art work. However, the same sparse 3D activation tensor is also approximated by a collection of local features. These local features are then robustly matched to approximate t… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Journal ref: CVPR 2019

  15. arXiv:1709.04725  [pdf, other

    cs.CV

    Unsupervised object discovery for instance recognition

    Authors: Oriane Siméoni, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum

    Abstract: Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, that are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually r… ▽ More

    Submitted 24 January, 2018; v1 submitted 14 September, 2017; originally announced September 2017.