Skip to main content

Showing 1–26 of 26 results for author: Gidaris, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08113  [pdf, other

    cs.CV cs.RO

    Valeo4Cast: A Modular Approach to End-to-End Forecasting

    Authors: Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, Matthieu Cord

    Abstract: Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect from sensor data (cameras or LiDARs) the position and past trajectories of the different elements of the scene and predict their future location. We depart from the curren… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Winning solution of the Argoverse 2 "Unified Detection, Tracking, and Forecasting" challenge, held at CVPR 2024 WAD

  2. arXiv:2404.14027  [pdf, other

    cs.CV cs.LG

    OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

    Authors: Sophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonin Vobecky, Patrick Pérez, Renaud Marlet

    Abstract: We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to th… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

  3. arXiv:2401.09413  [pdf, other

    cs.CV

    POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

    Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map from input 2D images with the objective of enabling 3D grounding, segmentation and retrieval of free-form language queries. This is a challenging problem because of the 2D-3D ambiguity and the open-vocabulary nature of the target tasks, where obtaining annotated training data in 3D is difficult. The contributions of… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: accepted to NeurIPS 2023

  4. arXiv:2312.00648  [pdf, other

    cs.CV

    SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

    Authors: Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos, Nikos Komodakis

    Abstract: Unsupervised object-centric learning aims to decompose scenes into interpretable object entities, termed slots. Slot-based auto-encoders stand out as a prominent method for this task. Within them, crucial aspects include guiding the encoder to generate object-specific slots and ensuring the decoder utilizes them during reconstruction. This work introduces two novel techniques, (i) an attention-bas… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 (Highlight). Code: https://github.com/gkakogeorgiou/spot

  5. arXiv:2310.17504  [pdf, other

    cs.CV

    Three Pillars improving Vision Foundation Model Distillation for Lidar

    Authors: Gilles Puy, Spyros Gidaris, Alexandre Boulch, Oriane Siméoni, Corentin Sautier, Patrick Pérez, Andrei Bursuc, Renaud Marlet

    Abstract: Self-supervised image backbones can be used to address complex 2D tasks (e.g., semantic segmentation, object discovery) very efficiently and with little or no downstream supervision. Ideally, 3D backbones for lidar should be able to inherit these properties after distillation of these powerful 2D features. The most recent methods for image-to-lidar distillation on autonomous driving data show prom… ▽ More

    Submitted 19 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: The code is available at https://github.com/valeoai/ScaLR

  6. arXiv:2310.12904  [pdf, other

    cs.CV

    Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

    Authors: Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, Patrick Pérez

    Abstract: The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about the… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  7. arXiv:2307.09361  [pdf, other

    cs.CV cs.AI cs.LG

    MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

    Authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez

    Abstract: Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  8. arXiv:2301.10222  [pdf, other

    cs.CV cs.AI cs.LG

    RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

    Authors: Angelika Ando, Spyros Gidaris, Andrei Bursuc, Gilles Puy, Alexandre Boulch, Renaud Marlet

    Abstract: Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in c… ▽ More

    Submitted 25 April, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: CVPR 2023. Code at https://github.com/valeoai/rangevit

  9. arXiv:2207.12112  [pdf, other

    cs.CV

    Active Learning Strategies for Weakly-supervised Object Detection

    Authors: Huy V. Vo, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Jean Ponce

    Abstract: Object detectors trained with weak annotations are affordable alternatives to fully-supervised counterparts. However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using ``box-in-box'' (BiB), a novel active learning… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2022. Contains 27 pages, 9 tables and 6 figures

  10. arXiv:2203.16258  [pdf, other

    cs.CV cs.LG

    Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

    Authors: Corentin Sautier, Gilles Puy, Spyros Gidaris, Alexandre Boulch, Andrei Bursuc, Renaud Marlet

    Abstract: Segmenting or detecting objects in sparse Lidar point clouds are two important tasks in autonomous driving to allow a vehicle to act safely in its 3D environment. The best performing methods in 3D semantic segmentation or object detection rely on a large amount of annotated data. Yet annotating 3D Lidar data for these tasks is tedious and costly. In this context, we propose a self-supervised pre-t… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR2022

  11. What to Hide from Your Students: Attention-Guided Masked Image Modeling

    Authors: Ioannis Kakogeorgiou, Spyros Gidaris, Bill Psomas, Yannis Avrithis, Andrei Bursuc, Konstantinos Karantzalos, Nikos Komodakis

    Abstract: Transformers and masked language modeling are quickly being adopted and explored in computer vision as vision transformers and masked image modeling (MIM). In this work, we argue that image token masking differs from token masking in text, due to the amount and correlation of tokens in an image. In particular, to generate a challenging pretext task for MIM, we advocate a shift from random masking… ▽ More

    Submitted 22 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: ECCV 2022. Codes and models are available at https://github.com/gkakogeorgiou/attmask

    Journal ref: European Conference on Computer Vision (2022)

  12. arXiv:2203.11160  [pdf, other

    cs.CV

    Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

    Authors: Antonin Vobecky, David Hurych, Oriane Siméoni, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

    Abstract: This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars which, equipped with cameras and LiDAR sensors, drive around a city. Our contributions are threefold. First, we propose a novel method for cross-modal unsupervised learning of semantic image segmentation by leveraging synchronize… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: v2: improved quality of images. See the project webpage https://vobecant.github.io/DriveAndSegment/ for the code and more

  13. arXiv:2109.14279  [pdf, other

    cs.CV

    Localizing Objects with Self-Supervised Transformers and no Labels

    Authors: Oriane Siméoni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, Jean Ponce

    Abstract: Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image.… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Journal ref: BMVC 2021

  14. arXiv:2012.11552  [pdf, other

    cs.CV cs.LG

    OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning

    Authors: Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, Patrick Pérez

    Abstract: Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training. Although effective visual representations should indeed exhibit such invariances, there… ▽ More

    Submitted 29 October, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: Accepted to CVPR2021. Code at https://github.com/valeoai/obow

  15. arXiv:2002.12247  [pdf, other

    cs.CV cs.LG

    Learning Representations by Predicting Bags of Visual Words

    Authors: Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord

    Abstract: Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: Accepted to CVPR2020

  16. arXiv:1912.01540  [pdf, other

    cs.CV cs.LG

    QUEST: Quantized embedding space for transferring knowledge

    Authors: Himalaya Jain, Spyros Gidaris, Nikos Komodakis, Patrick Pérez, Matthieu Cord

    Abstract: Knowledge distillation refers to the process of training a compact student network to achieve better accuracy by learning from a high capacity teacher network. Most of the existing knowledge distillation methods direct the student to follow the teacher by matching the teacher's output, feature maps or their distribution. In this work, we propose a novel way to achieve this goal: by distilling the… ▽ More

    Submitted 17 July, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted at ECCV 2020

  17. arXiv:1908.10254  [pdf, other

    cs.CV

    Large-Scale Historical Watermark Recognition: dataset and a new consistency-based approach

    Authors: Xi Shen, Ilaria Pastrolin, Oumayma Bounou, Spyros Gidaris, Marc Smith, Olivier Poncet, Mathieu Aubry

    Abstract: Historical watermark recognition is a highly practical, yet unsolved challenge for archivists and historians. With a large number of well-defined classes, cluttered and noisy samples, different types of representations, both subtle differences between classes and high intra-class variation, historical watermarks are also challenging for pattern recognition. In this paper, overcoming the difficulty… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  18. arXiv:1906.05186  [pdf, other

    cs.CV cs.LG

    Boosting Few-Shot Visual Learning with Self-Supervision

    Authors: Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord

    Abstract: Few-shot learning and self-supervised learning address different facets of the same problem: how to train a model with little or no labeled data. Few-shot learning aims for optimization methods and models that can learn efficiently to recognize patterns in the low data regime. Self-supervised learning focuses instead on unlabeled data and looks into it for the supervisory signal to feed high capac… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  19. arXiv:1905.01102  [pdf, other

    cs.CV cs.LG

    Generating Classification Weights with GNN Denoising Autoencoders for Few-Shot Learning

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: Given an initial recognition model already trained on a set of base classes, the goal of this work is to develop a meta-model for few-shot learning. The meta-model, given as input some novel classes with few training examples per class, must properly adapt the existing recognition model into a new model that can correctly classify in a unified way both the novel and the base classes. To accomplish… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: Oral presentation at CVPR 2019. The code and models of our paper will be published on: https://github.com/gidariss/wDAE_GNN_FewShot

  20. arXiv:1804.09458  [pdf, other

    cs.CV cs.LG

    Dynamic Few-Shot Visual Learning without Forgetting

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: Accepted at CVPR 2018. Code and models will be published on: https://github.com/gidariss/FewShotWithoutForgetting

  21. arXiv:1803.08225  [pdf, other

    cs.CV

    PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

    Authors: George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy

    Abstract: We present a box-free bottom-up approach for the tasks of pose estimation and instance segmentation of people in multi-person images using an efficient single-shot model. The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling. Our model employs a convolutional network which learns to detect individual keypoints and predict their re… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

    Comments: Person detection and pose estimation, segmentation and grou**

  22. arXiv:1803.07728  [pdf, other

    cs.CV cs.LG

    Unsupervised Representation Learning by Predicting Image Rotations

    Authors: Spyros Gidaris, Praveer Singh, Nikos Komodakis

    Abstract: Over the last years, deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. Therefore, unsupervised semantic fe… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

    Comments: Accepted at ICLR2018. Code and models will be published on: https://github.com/gidariss/FeatureLearningRotNet

  23. arXiv:1612.04770  [pdf, other

    cs.CV cs.LG

    Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: Pixel wise image labeling is an interesting and challenging problem with great significance in the computer vision community. In order for a dense labeling algorithm to be able to achieve accurate and precise results, it has to consider the dependencies that exist in the joint space of both the input and the output variables. An implicit approach for modeling those dependencies is by training a de… ▽ More

    Submitted 14 December, 2016; originally announced December 2016.

  24. arXiv:1606.04446  [pdf, other

    cs.CV

    Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: The problem of computing category agnostic bounding box proposals is utilized as a core component in many computer vision tasks and thus has lately attracted a lot of attention. In this work we propose a new approach to tackle this problem that is based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then pr… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

    Comments: Technical report. Code as well as box proposals computed for several datasets are available at:: https://github.com/gidariss/AttractioNet

  25. arXiv:1511.07763  [pdf, other

    cs.CV cs.LG cs.NE

    LocNet: Improving Localization Accuracy for Object Detection

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: We propose a novel object localization methodology with the purpose of boosting the localization accuracy of state-of-the-art object detection systems. Our model, given a search region, aims at returning the bounding box of an object of interest inside this region. To accomplish its goal, it relies on assigning conditional probabilities to each row and column of this region, where these probabilit… ▽ More

    Submitted 7 April, 2016; v1 submitted 24 November, 2015; originally announced November 2015.

    Comments: Extended technical report -- short version to appear as oral paper on CVPR 2016. Code: https://github.com/gidariss/LocNet/

  26. arXiv:1505.01749  [pdf, other

    cs.CV cs.LG cs.NE

    Object detection via a multi-region & semantic segmentation-aware CNN model

    Authors: Spyros Gidaris, Nikos Komodakis

    Abstract: We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our… ▽ More

    Submitted 23 September, 2015; v1 submitted 7 May, 2015; originally announced May 2015.

    Comments: Extended technical report -- short version to appear at ICCV 2015