Skip to main content

Showing 1–32 of 32 results for author: Ošep, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13129  [pdf, other

    cs.CV cs.RO

    Better Call SAL: Towards Learning to Segment Anything in Lidar

    Authors: Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé

    Abstract: We propose $\texttt{SAL}$ ($\texttt{S}$egment $\texttt{A}$nything in $\texttt{L}$idar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision. While the established paradigm for $\textit{Lidar Panoptic Segmentation}$ (LPS) relies on manual supervision for a ha… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2402.19463  [pdf, other

    cs.CV

    SeMoLi: What Moves Together Belongs Together

    Authors: Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni, Simon Lucey, Laura Leal-Taixé

    Abstract: We tackle semi-supervised object detection based on motion cues. Recent results suggest that heuristic-based clustering methods in conjunction with object trackers can be used to pseudo-label instances of moving objects and use these as supervisory signals to train 3D object detectors in Lidar data without manual supervision. We re-think this approach and suggest that both, object detection, as we… ▽ More

    Submitted 25 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024!

  3. arXiv:2310.12464  [pdf, other

    cs.CV cs.RO

    Lidar Panoptic Segmentation and Tracking without Bells and Whistles

    Authors: Abhinav Agarwalla, Xuhua Huang, Jason Ziglar, Francesco Ferroni, Laura Leal-Taixé, James Hays, Aljoša Ošep, Deva Ramanan

    Abstract: State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up segmentation-centric fashion wherein they build upon semantic segmentation networks by utilizing clustering to obtain object instances. In this paper, we re-think this approach and propose a surprisingly simple yet effective detection-centric network for both LPS and tracking. Our network is modular by design and optimized… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: IROS 2023. Code at https://github.com/abhinavagarwalla/most-lps

  4. arXiv:2304.11705  [pdf, other

    cs.CV cs.AI cs.LG

    Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation

    Authors: Cristiano Saltori, Aljoša Ošep, Elisa Ricci, Laura Leal-Taixé

    Abstract: The ability to deploy robots that can operate safely in diverse environments is crucial for develo** embodied intelligent agents. As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semant… ▽ More

    Submitted 29 August, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  5. arXiv:2301.04224  [pdf, other

    cs.CV cs.LG

    Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images

    Authors: Xindi Wu, KwunFung Lau, Francesco Ferroni, Aljoša Ošep, Deva Ramanan

    Abstract: Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that thi… ▽ More

    Submitted 9 April, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: 12 pages, 8 figures

  6. arXiv:2210.10774  [pdf, other

    cs.CV

    Learning to Discover and Detect Objects

    Authors: Vladimir Fomenko, Ismail Elezi, Deva Ramanan, Laura Leal-Taixé, Aljoša Ošep

    Abstract: We tackle the problem of novel class discovery and localization (NCDL). In this setting, we assume a source dataset with supervision for only some object classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity without any human supervision. To tackle NCDL, we propose a two-stage object detection network Region-based NCDL (RNCDL)… ▽ More

    Submitted 30 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022, Homepage: https://vlfom.github.io/RNCDL/

  7. arXiv:2210.07681  [pdf, other

    cs.CV

    Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?

    Authors: Patrick Dendorfer, Vladimir Yugay, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Recent developments in monocular multi-object tracking have been very successful in tracking visible objects and bridging short occlusion gaps, mainly relying on data-driven appearance models. While we have significantly advanced short-term tracking performance, bridging longer occlusion gaps remains elusive: state-of-the-art object trackers only bridge less than 10% of occlusions longer than thre… ▽ More

    Submitted 25 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022; fixed small typo

  8. arXiv:2209.14965  [pdf, other

    cs.CV

    DirectTracker: 3D Multi-Object Tracking Using Direct Image Alignment and Photometric Bundle Adjustment

    Authors: Mariia Gladkova, Nikita Korobov, Nikolaus Demmel, Aljoša Ošep, Laura Leal-Taixé, Daniel Cremers

    Abstract: Direct methods have shown excellent performance in the applications of visual odometry and SLAM. In this work we propose to leverage their effectiveness for the task of 3D multi-object tracking. To this end, we propose DirectTracker, a framework that effectively combines direct image alignment for the short-term tracking and sliding-window photometric bundle adjustment for 3D object detection. Obj… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), 2022

  9. arXiv:2208.01957  [pdf, other

    cs.CV cs.LG

    PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?

    Authors: Aleksandr Kim, Guillem Brasó, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Most (3D) multi-object tracking methods rely on appearance-based cues for data association. By contrast, we investigate how far we can get by only encoding geometric relationships between objects in 3D space as cues for data-driven data association. We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates o… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: ECCV 2022, 17 pages, 5 pages of supplementary, 3 figures

  10. arXiv:2203.16297  [pdf, other

    cs.CV cs.LG cs.RO

    Forecasting from LiDAR via Future Object Detection

    Authors: Neehar Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Deva Ramanan

    Abstract: Object detection and forecasting are fundamental components of embodied perception. These two problems, however, are largely studied in isolation by the community. In this paper, we propose an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Instead of predicting the current frame locations and forecasting forward in time,… ▽ More

    Submitted 31 March, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: This work has been accepted to Computer Vision and Pattern Recognition (CVPR) 2022

  11. arXiv:2203.15125  [pdf, other

    cs.CV cs.CL cs.LG

    Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

    Authors: Manuel Kolmet, Qunjie Zhou, Aljosa Osep, Laura Leal-Taixe

    Abstract: Natural language-based communication with mobile devices and home appliances is becoming increasingly popular and has the potential to become natural for communicating with mobile robots in the future. Towards this goal, we investigate cross-modal text-to-point-cloud localization that will allow us to specify, for example, a vehicle pick-up or goods delivery location. In particular, we propose Tex… ▽ More

    Submitted 5 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR2022 Camera Ready Version

  12. arXiv:2203.12979  [pdf, other

    cs.CV

    Is Geometry Enough for Matching in Visual Localization?

    Authors: Qunjie Zhou, Sérgio Agostinho, Aljosa Osep, Laura Leal-Taixé

    Abstract: In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. While matching keypoints via visual descriptors makes localization highly accurate, it has significant storage demands, raises privacy concerns and requires update to the descriptors in the long-term. To elegantly ad… ▽ More

    Submitted 30 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: ECCV2022 Camera Ready

  13. arXiv:2108.09518  [pdf, other

    cs.CV

    MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

    Authors: Matteo Fabbri, Guillem Braso, Gianluca Maugeri, Orcun Cetintas, Riccardo Gasparini, Aljosa Osep, Simone Calderara, Laura Leal-Taixe, Rita Cucchiara

    Abstract: Deep learning-based methods for video pedestrian detection and tracking require large volumes of training data to achieve good performance. However, data acquisition in crowded public environments raises data privacy concerns -- we are not allowed to simply record and store data without the explicit consent of all participants. Furthermore, the annotation of such data for computer vision applicati… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 camera-ready version

  14. arXiv:2108.03257  [pdf, other

    cs.CV

    (Just) A Spoonful of Refinements Helps the Registration Error Go Down

    Authors: Sérgio Agostinho, Aljoša Ošep, Alessio Del Bue, Laura Leal-Taixé

    Abstract: We tackle data-driven 3D point cloud registration. Given point correspondences, the standard Kabsch algorithm provides an optimal rotation estimate. This allows to train registration models in an end-to-end manner by differentiating the SVD operation. However, given the initial rotation estimate supplied by Kabsch, we show we can improve point correspondence learning during model training by exten… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 (Oral)

  15. arXiv:2104.14682  [pdf, other

    cs.CV cs.RO

    EagerMOT: 3D Multi-Object Tracking via Sensor Fusion

    Authors: Aleksandr Kim, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. On the other hand, cameras provide a dense and rich visual signal that… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: To be published at ICRA 2021. Source code available at https://github.com/aleksandrkim61/EagerMOT

  16. arXiv:2104.11221  [pdf, other

    cs.CV

    Opening up Open-World Tracking

    Authors: Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An autonomous agent that is blind to never-seen-before objects poses a safety hazard when operating in the real world - and yet this is how almost all current systems work. One of the main obstacles towards advancing tracking any object is that this… ▽ More

    Submitted 28 March, 2022; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2022 (Oral). https://openworldtracking.github.io/

  17. arXiv:2102.12472  [pdf, other

    cs.CV cs.RO

    4D Panoptic LiDAR Segmentation

    Authors: Mehmet Aygün, Aljoša Ošep, Mark Weber, Maxim Maximov, Cyrill Stachniss, Jens Behley, Laura Leal-Taixé

    Abstract: Temporal semantic scene understanding is critical for self-driving cars or robots operating in dynamic environments. In this paper, we propose 4D panoptic LiDAR segmentation to assign a semantic class and a temporally-consistent instance ID to a sequence of 3D points. To this end, we present an approach and a point-centric evaluation metric. Our approach determines a semantic class for every point… ▽ More

    Submitted 7 April, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: CVPR 2021

  18. arXiv:2102.11859  [pdf, other

    cs.CV

    STEP: Segmenting and Tracking Every Pixel

    Authors: Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers, Aljoša Ošep, Laura Leal-Taixé, Liang-Chieh Chen

    Abstract: The task of assigning semantic classes and track identities to every pixel in a video is called video panoptic segmentation. Our work is the first that targets this task in a real-world setting requiring dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sp… ▽ More

    Submitted 7 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted to NeurIPS 2021 Track on Datasets and Benchmarks. Code: https://github.com/google-research/deeplab2

  19. arXiv:2010.07548  [pdf, other

    cs.CV

    MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

    Authors: Patrick Dendorfer, Aljoša Ošep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, Laura Leal-Taixé

    Abstract: Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched… ▽ More

    Submitted 8 December, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at IJCV

  20. arXiv:2010.01114  [pdf, other

    cs.CV

    Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation

    Authors: Patrick Dendorfer, Aljoša Ošep, Laura Leal-Taixé

    Abstract: In this paper, we present Goal-GAN, an interpretable and end-to-end trainable model for human trajectory prediction. Inspired by human navigation, we model the task of trajectory prediction as an intuitive two-stage process: (i) goal estimation, which predicts the most likely target positions of the agent, followed by a (ii) routing module which estimates a set of plausible trajectories that route… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Oral presentation at ACCV 2020

  21. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking

    Authors: Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, Bastian Leibe

    Abstract: Multi-Object Tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, HOTA (Higher Order Tracking Accuracy), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing track… ▽ More

    Submitted 29 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Pre-print. Accepted for Publication in the International Journal of Computer Vision, 19 August 2020. Code is available at https://github.com/JonathonLuiten/HOTA-metrics

    Journal ref: International Journal of Computer Vision (2020)

  22. arXiv:2008.11516  [pdf, other

    cs.CV

    Making a Case for 3D Convolutions for Object Segmentation in Videos

    Authors: Sabarinath Mahadevan, Ali Athar, Aljoša Ošep, Sebastian Hennen, Laura Leal-Taixé, Bastian Leibe

    Abstract: The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using standard 2D convolutional networks, followed by a learned fusion of the two sources of information. On the other hand, 3D convolutional networks have been successfully applied for video classification tasks, but have not been leveraged as effectively to problems involv… ▽ More

    Submitted 1 September, 2023; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: BMVC '20

  23. STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

    Authors: Ali Athar, Sabarinath Mahadevan, Aljoša Ošep, Laura Leal-Taixé, Bastian Leibe

    Abstract: Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in individual frames, and then associate these detections over time. Hence, these methods are often non-end-to-end trainable and highly tailored to specific tasks. In… ▽ More

    Submitted 1 September, 2023; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 28 pages, 6 figures

    MSC Class: 68T45; 68T10; 62H30 ACM Class: I.2.10; I.4.6; I.4.8; I.5.3

  24. arXiv:1910.04668  [pdf, other

    cs.CV cs.RO

    AlignNet-3D: Fast Point Cloud Registration of Partially Observed Objects

    Authors: Johannes Groß, Aljosa Osep, Bastian Leibe

    Abstract: Methods tackling multi-object tracking need to estimate the number of targets in the sensing area as well as to estimate their continuous state. While the majority of existing methods focus on data association, precise state (3D pose) estimation is often only coarsely estimated by approximating targets with centroids or (3D) bounding boxes. However, in automotive scenarios, motion perception of su… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Presented at 3DV'19

  25. arXiv:1906.06618  [pdf, other

    cs.CV

    How To Train Your Deep Multi-Object Tracker

    Authors: Yihong Xu, Aljosa Osep, Yutong Ban, Radu Horaud, Laura Leal-Taixe, Xavier Alameda-Pineda

    Abstract: The recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representational power of deep learning to jointly learn to detect and track objects. However, existing methods train only certain sub-modules using loss functions that often do not correlate with established tracking evaluation measures such as Multi-Object Tracking Accuracy (MOTA) and Precision (MOTP).… ▽ More

    Submitted 23 April, 2020; v1 submitted 15 June, 2019; originally announced June 2019.

    Comments: 14 pages, 9 figures, 6 tables

  26. arXiv:1903.00362  [pdf, other

    cs.CV

    Large-Scale Object Mining for Object Discovery from Unlabeled Video

    Authors: Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe

    Abstract: This paper addresses the problem of object discovery from unlabeled driving videos captured in a realistic automotive setting. Identifying recurring object categories in such raw video streams is a very challenging problem. Not only do object candidates first have to be localized in the input images, but many interesting object categories occur relatively infrequently. Object discovery will theref… ▽ More

    Submitted 29 April, 2019; v1 submitted 28 February, 2019; originally announced March 2019.

    Comments: Updated version of ICRA'19 paper (additional qualitative results); arXiv admin note: text overlap with arXiv:1712.08832

  27. arXiv:1902.03604  [pdf, other

    cs.CV

    MOTS: Multi-Object Tracking and Segmentation

    Authors: Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, Bastian Leibe

    Abstract: This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend exis… ▽ More

    Submitted 8 April, 2019; v1 submitted 10 February, 2019; originally announced February 2019.

    Comments: CVPR 2019 camera-ready version

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019

  28. arXiv:1901.09260  [pdf, other

    cs.CV cs.RO

    4D Generic Video Object Proposals

    Authors: Aljosa Osep, Paul Voigtlaender, Mark Weber, Jonathon Luiten, Bastian Leibe

    Abstract: Many high-level video understanding methods require input in the form of object proposals. Currently, such proposals are predominantly generated with the help of networks that were trained for detecting and segmenting a set of known object classes, which limits their applicability to cases where all objects of interest are represented in the training set. This is a restriction for automotive scena… ▽ More

    Submitted 20 May, 2020; v1 submitted 26 January, 2019; originally announced January 2019.

    Comments: ICRA 2020

  29. arXiv:1809.07357  [pdf, other

    cs.CV

    Combined Image- and World-Space Tracking in Traffic Scenes

    Authors: Aljosa Osep, Wolfgang Mehner, Markus Mathias, Bastian Leibe

    Abstract: Tracking in urban street scenes plays a central role in autonomous systems such as self-driving cars. Most of the current vision-based tracking methods perform tracking in the image domain. Other approaches, eg based on LIDAR and radar, track purely in 3D. While some vision-based tracking methods invoke 3D information in parts of their pipeline, and some 3D-based methods utilize image-based inform… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 8 pages, 7 figures, 2 tables. ICRA 2017 paper

  30. arXiv:1809.07316  [pdf, other

    cs.CV

    Towards Large-Scale Video Video Object Mining

    Authors: Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe

    Abstract: We propose to leverage a generic object tracker in order to perform object mining in large-scale unlabeled videos, captured in a realistic automotive setting. We present a dataset of more than 360'000 automatically mined object tracks from 10+ hours of video data (560'000 frames) and propose a method for automated novel category discovery and detector learning. In addition, we show preliminary res… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 4 pages, 3 figures, 1 table. ECCV 2018 Workshop on Interactive and Adaptive Learning in an Open World

  31. arXiv:1712.08832  [pdf, other

    cs.CV

    Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video

    Authors: Aljoša Ošep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe

    Abstract: We explore object discovery and detector adaptation based on unlabeled video sequences captured from a mobile platform. We propose a fully automatic approach for object mining from video which builds upon a generic object tracking approach. By applying this method to three large video datasets from autonomous driving and mobile robotics scenarios, we demonstrate its robustness and generality. Base… ▽ More

    Submitted 23 December, 2017; originally announced December 2017.

    Comments: CVPR'18 submission

  32. arXiv:1712.07920  [pdf, other

    cs.CV

    Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking

    Authors: Aljoša Ošep, Wolfgang Mehner, Paul Voigtlaender, Bastian Leibe

    Abstract: The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this pape… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: ICRA'18 submission