Skip to main content

Showing 1–50 of 85 results for author: Leal-Taixe, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.11426  [pdf, other

    cs.CV

    SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow

    Authors: Orcun Cetintas, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé

    Abstract: Increasing the annotation efficiency of trajectory annotations from videos has the potential to enable the next generation of data-hungry tracking algorithms to thrive on large-scale datasets. Despite the importance of this task, there are currently very few works exploring how to efficiently label tracking datasets comprehensively. In this work, we introduce SPAM, a tracking data engine that prov… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  2. arXiv:2403.16605  [pdf, other

    cs.CV

    SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

    Authors: Aysim Toker, Marvin Eisenberger, Daniel Cremers, Laura Leal-Taixé

    Abstract: In recent years, semantic segmentation has become a pivotal tool in processing and interpreting satellite imagery. Yet, a prevalent limitation of supervised learning techniques remains the need for extensive manual annotations by experts. In this work, we explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. The main idea is to le… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  3. arXiv:2403.13129  [pdf, other

    cs.CV cs.RO

    Better Call SAL: Towards Learning to Segment Anything in Lidar

    Authors: Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé

    Abstract: We propose $\texttt{SAL}$ ($\texttt{S}$egment $\texttt{A}$nything in $\texttt{L}$idar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision. While the established paradigm for $\textit{Lidar Panoptic Segmentation}$ (LPS) relies on manual supervision for a ha… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  4. arXiv:2403.09577  [pdf, other

    cs.CV

    The NeRFect Match: Exploring NeRF Features for Visual Localization

    Authors: Qunjie Zhou, Maxim Maximov, Or Litany, Laura Leal-Taixé

    Abstract: In this work, we propose the use of Neural Radiance Fields (NeRF) as a scene representation for visual localization. Recently, NeRF has been employed to enhance pose regression and scene coordinate regression models by augmenting the training database, providing auxiliary supervision through rendered images, or serving as an iterative refinement module. We extend its recognized advantages -- its a… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  5. arXiv:2402.19463  [pdf, other

    cs.CV

    SeMoLi: What Moves Together Belongs Together

    Authors: Jenny Seidenschwarz, Aljoša Ošep, Francesco Ferroni, Simon Lucey, Laura Leal-Taixé

    Abstract: We tackle semi-supervised object detection based on motion cues. Recent results suggest that heuristic-based clustering methods in conjunction with object trackers can be used to pseudo-label instances of moving objects and use these as supervisory signals to train 3D object detectors in Lidar data without manual supervision. We re-think this approach and suggest that both, object detection, as we… ▽ More

    Submitted 25 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024!

  6. arXiv:2310.12464  [pdf, other

    cs.CV cs.RO

    Lidar Panoptic Segmentation and Tracking without Bells and Whistles

    Authors: Abhinav Agarwalla, Xuhua Huang, Jason Ziglar, Francesco Ferroni, Laura Leal-Taixé, James Hays, Aljoša Ošep, Deva Ramanan

    Abstract: State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up segmentation-centric fashion wherein they build upon semantic segmentation networks by utilizing clustering to obtain object instances. In this paper, we re-think this approach and propose a surprisingly simple yet effective detection-centric network for both LPS and tracking. Our network is modular by design and optimized… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: IROS 2023. Code at https://github.com/abhinavagarwalla/most-lps

  7. arXiv:2309.08947  [pdf, other

    cs.CV

    Staged Contact-Aware Global Human Motion Forecasting

    Authors: Luca Scofano, Alessio Sampieri, Elisabeth Schiele, Edoardo De Matteis, Laura Leal-Taixé, Fabio Galasso

    Abstract: Scene-aware global human motion forecasting is critical for manifold applications, including virtual reality, robotics, and sports. The task combines human trajectory and pose forecasting within the provided scene context, which represents a significant challenge. So far, only Mao et al. NeurIPS'22 have addressed scene-aware global motion, cascading the prediction of future scene contact points… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 15 pages, 7 figures, BMVC23 oral

  8. arXiv:2308.15266  [pdf, other

    cs.CV

    NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

    Authors: Tim Meinhardt, Matt Feiszli, Yuchen Fan, Laura Leal-Taixe, Rakesh Ranjan

    Abstract: Until recently, the Video Instance Segmentation (VIS) community operated under the common belief that offline methods are generally superior to a frame by frame online processing. However, the recent success of online methods questions this belief, in particular, for challenging and long video sequences. We understand this work as a rebuttal of those recent observations and an appeal to the commun… ▽ More

    Submitted 18 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  9. arXiv:2306.11710  [pdf, other

    cs.CV

    Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification via Full-Body Person Synthesis

    Authors: Maxim Maximov, Tim Meinhardt, Ismail Elezi, Zoe Papakipos, Caner Hazirbas, Cristian Canton Ferrer, Laura Leal-Taixé

    Abstract: The advent of data-driven technology solutions is accompanied by an increasing concern with data privacy. This is of particular importance for human-centered image recognition tasks, such as pedestrian detection, re-identification, and tracking. To highlight the importance of privacy issues and motivate future research, we motivate and introduce the Pedestrian Dataset De-Identification (PDI) task.… ▽ More

    Submitted 22 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  10. arXiv:2304.11705  [pdf, other

    cs.CV cs.AI cs.LG

    Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation

    Authors: Cristiano Saltori, Aljoša Ošep, Elisa Ricci, Laura Leal-Taixé

    Abstract: The ability to deploy robots that can operate safely in diverse environments is crucial for develo** embodied intelligent agents. As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semant… ▽ More

    Submitted 29 August, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  11. arXiv:2212.03038  [pdf, other

    cs.CV

    Unifying Short and Long-Term Tracking with Graph Hierarchies

    Authors: Orcun Cetintas, Guillem Brasó, Laura Leal-Taixé

    Abstract: Tracking objects over long videos effectively means solving a spectrum of problems, from short-term association for un-occluded objects to long-term association for objects that are occluded and then reappear in the scene. Methods tackling these two tasks are often disjoint and crafted for specific scenarios, and top-performing approaches are often a mix of techniques, which yields engineering-hea… ▽ More

    Submitted 30 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  12. arXiv:2212.02910  [pdf, other

    cs.CV

    G-MSM: Unsupervised Multi-Shape Matching with Graph-based Affinity Priors

    Authors: Marvin Eisenberger, Aysim Toker, Laura Leal-Taixé, Daniel Cremers

    Abstract: We present G-MSM (Graph-based Multi-Shape Matching), a novel unsupervised learning approach for non-rigid shape correspondence. Rather than treating a collection of input poses as an unordered set of samples, we explicitly model the underlying shape data manifold. To this end, we propose an adaptive multi-shape matching architecture that constructs an affinity graph on a given set of training shap… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  13. arXiv:2211.04625  [pdf, other

    cs.CV

    Soft Augmentation for Image Classification

    Authors: Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

    Abstract: Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies… ▽ More

    Submitted 23 January, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 16241-16250)

  14. arXiv:2210.10774  [pdf, other

    cs.CV

    Learning to Discover and Detect Objects

    Authors: Vladimir Fomenko, Ismail Elezi, Deva Ramanan, Laura Leal-Taixé, Aljoša Ošep

    Abstract: We tackle the problem of novel class discovery and localization (NCDL). In this setting, we assume a source dataset with supervision for only some object classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity without any human supervision. To tackle NCDL, we propose a two-stage object detection network Region-based NCDL (RNCDL)… ▽ More

    Submitted 30 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022, Homepage: https://vlfom.github.io/RNCDL/

  15. arXiv:2210.07681  [pdf, other

    cs.CV

    Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?

    Authors: Patrick Dendorfer, Vladimir Yugay, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Recent developments in monocular multi-object tracking have been very successful in tracking visible objects and bridging short occlusion gaps, mainly relying on data-driven appearance models. While we have significantly advanced short-term tracking performance, bridging longer occlusion gaps remains elusive: state-of-the-art object trackers only bridge less than 10% of occlusions longer than thre… ▽ More

    Submitted 25 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022; fixed small typo

  16. arXiv:2210.05657  [pdf, other

    cs.CV cs.AI

    The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

    Authors: Peter Kocsis, Peter Súkeník, Guillem Brasó, Matthias Nießner, Laura Leal-Taixé, Ismail Elezi

    Abstract: Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022, Homepage: https://peter-kocsis.github.io/LowDataGeneralization/ 24 pages, 14 figures, 12 tables

    ACM Class: I.2.10; I.5.1; I.4.8

  17. arXiv:2209.14965  [pdf, other

    cs.CV

    DirectTracker: 3D Multi-Object Tracking Using Direct Image Alignment and Photometric Bundle Adjustment

    Authors: Mariia Gladkova, Nikita Korobov, Nikolaus Demmel, Aljoša Ošep, Laura Leal-Taixé, Daniel Cremers

    Abstract: Direct methods have shown excellent performance in the applications of visual odometry and SLAM. In this work we propose to leverage their effectiveness for the task of 3D multi-object tracking. To this end, we propose DirectTracker, a framework that effectively combines direct image alignment for the short-term tracking and sliding-window photometric bundle adjustment for 3D object detection. Obj… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), 2022

  18. arXiv:2208.01957  [pdf, other

    cs.CV cs.LG

    PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?

    Authors: Aleksandr Kim, Guillem Brasó, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Most (3D) multi-object tracking methods rely on appearance-based cues for data association. By contrast, we investigate how far we can get by only encoding geometric relationships between objects in 3D space as cues for data-driven data association. We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates o… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: ECCV 2022, 17 pages, 5 pages of supplementary, 3 figures

  19. arXiv:2207.11103  [pdf, other

    cs.CV cs.LG cs.RO

    DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

    Authors: Adrià Caelles, Tim Meinhardt, Guillem Brasó, Laura Leal-Taixé

    Abstract: Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out on a joint solution. Transformers recently allowed to cast the entire VIS task as a single set-prediction problem. Nevertheless, the quadratic complexity of exi… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  20. arXiv:2207.07454  [pdf, other

    cs.CV

    Multi-Object Tracking and Segmentation via Neural Message Passing

    Authors: Guillem Braso, Orcun Cetintas, Laura Leal-Taixe

    Abstract: Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and Multiple Object Tracking and Segmentation (MOTS) within the tracking-by-detection paradigm. However, they also introduce a major challenge for learning methods, as defining a model that can operate on such structured domain is not trivial. In this work, we exploit the classical network flow formulation of MOT to define a fu… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:1912.07515

  21. arXiv:2206.04656  [pdf, other

    cs.CV

    Simple Cues Lead to a Strong Multi-Object Tracker

    Authors: Jenny Seidenschwarz, Guillem Brasó, Victor Castro Serrano, Ismail Elezi, Laura Leal-Taixé

    Abstract: For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this… ▽ More

    Submitted 26 April, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted to CVPR2023!

  22. arXiv:2205.06688  [pdf, other

    cs.CV

    A Unified Framework for Implicit Sinkhorn Differentiation

    Authors: Marvin Eisenberger, Aysim Toker, Laura Leal-Taixé, Florian Bernard, Daniel Cremers

    Abstract: The Sinkhorn operator has recently experienced a surge of popularity in computer vision and related fields. One major reason is its ease of integration into deep learning frameworks. To allow for an efficient training of respective neural networks, we propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. In comparison to prior work, our framework… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: To appear at CVPR 2022

  23. arXiv:2204.01509  [pdf, other

    cs.CV cs.LG

    The Group Loss++: A deeper look into group loss for deep metric learning

    Authors: Ismail Elezi, Jenny Seidenschwarz, Laurin Wagner, Sebastiano Vascon, Alessandro Torcinovich, Marcello Pelillo, Laura Leal-Taixe

    Abstract: Deep metric learning has yielded impressive results in tasks such as clustering and image retrieval by leveraging neural networks to obtain highly discriminative feature embeddings, which can be used to group samples into different classes. Much research has been devoted to the design of smart loss functions or data mining strategies for training such networks. Most methods consider only pairs or… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI), 2022. Includes supplementary material

  24. arXiv:2203.16297  [pdf, other

    cs.CV cs.LG cs.RO

    Forecasting from LiDAR via Future Object Detection

    Authors: Neehar Peri, Jonathon Luiten, Mengtian Li, Aljoša Ošep, Laura Leal-Taixé, Deva Ramanan

    Abstract: Object detection and forecasting are fundamental components of embodied perception. These two problems, however, are largely studied in isolation by the community. In this paper, we propose an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Instead of predicting the current frame locations and forecasting forward in time,… ▽ More

    Submitted 31 March, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: This work has been accepted to Computer Vision and Pattern Recognition (CVPR) 2022

  25. arXiv:2203.15125  [pdf, other

    cs.CV cs.CL cs.LG

    Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

    Authors: Manuel Kolmet, Qunjie Zhou, Aljosa Osep, Laura Leal-Taixe

    Abstract: Natural language-based communication with mobile devices and home appliances is becoming increasingly popular and has the potential to become natural for communicating with mobile robots in the future. Towards this goal, we investigate cross-modal text-to-point-cloud localization that will allow us to specify, for example, a vehicle pick-up or goods delivery location. In particular, we propose Tex… ▽ More

    Submitted 5 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: CVPR2022 Camera Ready Version

  26. arXiv:2203.12979  [pdf, other

    cs.CV

    Is Geometry Enough for Matching in Visual Localization?

    Authors: Qunjie Zhou, Sérgio Agostinho, Aljosa Osep, Laura Leal-Taixé

    Abstract: In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. While matching keypoints via visual descriptors makes localization highly accurate, it has significant storage demands, raises privacy concerns and requires update to the descriptors in the long-term. To elegantly ad… ▽ More

    Submitted 30 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: ECCV2022 Camera Ready

  27. arXiv:2203.12560  [pdf, other

    cs.CV

    DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation

    Authors: Aysim Toker, Lukas Kondmann, Mark Weber, Marvin Eisenberger, Andrés Camero, **gliang Hu, Ariadna Pregel Hoderlein, Çağlar Şenaras, Timothy Davis, Daniel Cremers, Giovanni Marchisio, Xiao Xiang Zhu, Laura Leal-Taixé

    Abstract: Earth observation is a fundamental tool for monitoring the evolution of land use in specific areas of interest. Observing and precisely defining change, in this context, requires both time-series data and pixel-wise segmentations. To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the g… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022, evaluation webpage: https://codalab.lisn.upsaclay.fr/competitions/2882

  28. arXiv:2110.05132  [pdf, other

    cs.CV

    The Center of Attention: Center-Keypoint Grou** via Attention for Multi-Person Pose Estimation

    Authors: Guillem Brasó, Nikita Kister, Laura Leal-Taixé

    Abstract: We introduce CenterGroup, an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image. Our approach uses a transformer to obtain context-aware embeddings for all detected keypoints and centers and then applies multi-head attention to directly group joints into their corresponding person centers. While most bottom-up metho… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ICCV 2021; reports improved multi-scale results

  29. Spatial Context Awareness for Unsupervised Change Detection in Optical Satellite Images

    Authors: Lukas Kondmann, Aysim Toker, Sudipan Saha, Bernhard Schölkopf, Laura Leal-Taixé, Xiao Xiang Zhu

    Abstract: Detecting changes on the ground in multitemporal Earth observation data is one of the key problems in remote sensing. In this paper, we introduce Sibling Regression for Optical Change detection (SiROC), an unsupervised method for change detection in optical satellite images with medium and high resolution. SiROC is a spatial context-based method that models a pixel as a linear combination of its d… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: Submitted to IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS)

  30. arXiv:2108.09518  [pdf, other

    cs.CV

    MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?

    Authors: Matteo Fabbri, Guillem Braso, Gianluca Maugeri, Orcun Cetintas, Riccardo Gasparini, Aljosa Osep, Simone Calderara, Laura Leal-Taixe, Rita Cucchiara

    Abstract: Deep learning-based methods for video pedestrian detection and tracking require large volumes of training data to achieve good performance. However, data acquisition in crowded public environments raises data privacy concerns -- we are not allowed to simply record and store data without the explicit consent of all participants. Furthermore, the annotation of such data for computer vision applicati… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 camera-ready version

  31. arXiv:2108.09274  [pdf, other

    cs.CV

    MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction

    Authors: Patrick Dendorfer, Sven Elflein, Laura Leal-Taixé

    Abstract: Pedestrian trajectory prediction is challenging due to its uncertain and multimodal nature. While generative adversarial networks can learn a distribution over future trajectories, they tend to predict out-of-distribution samples when the distribution of future trajectories is a mixture of multiple, possibly disconnected modes. To address this issue, we propose a multi-generator model for pedestri… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: Accepted at ICCV 2021; Code available: https://github.com/selflein/MG-GAN

  32. arXiv:2108.03257  [pdf, other

    cs.CV

    (Just) A Spoonful of Refinements Helps the Registration Error Go Down

    Authors: Sérgio Agostinho, Aljoša Ošep, Alessio Del Bue, Laura Leal-Taixé

    Abstract: We tackle data-driven 3D point cloud registration. Given point correspondences, the standard Kabsch algorithm provides an optimal rotation estimate. This allows to train registration models in an end-to-end manner by differentiating the SVD operation. However, given the initial rotation estimate supplied by Kabsch, we show we can improve point correspondence learning during model training by exten… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 (Oral)

  33. arXiv:2106.11921  [pdf, other

    cs.CV cs.LG

    Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection

    Authors: Ismail Elezi, Zhiding Yu, Anima Anandkumar, Laura Leal-Taixe, Jose M. Alvarez

    Abstract: Deep neural networks have reached high accuracy on object detection but their success hinges on large amounts of labeled data. To reduce the labels dependency, various active learning strategies have been proposed, typically based on the confidence of the detector. However, these methods are biased towards high-performing classes and can lead to acquired datasets that are not good representatives… ▽ More

    Submitted 29 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Includes supplementary material

  34. arXiv:2106.09748  [pdf, other

    cs.CV

    DeepLab2: A TensorFlow Library for Deep Labeling

    Authors: Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

    Abstract: DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the sta… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 4-page technical report. The first three authors contributed equally to this work

  35. arXiv:2106.09672  [pdf, other

    cs.CV

    The 2021 Image Similarity Dataset and Challenge

    Authors: Matthijs Douze, Giorgos Tolias, Ed Pizzi, Zoë Papakipos, Lowik Chanussot, Filip Radenovic, Tomas Jenicek, Maxim Maximov, Laura Leal-Taixé, Ismail Elezi, Ondřej Chum, Cristian Canton Ferrer

    Abstract: This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a reference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edi… ▽ More

    Submitted 21 February, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

  36. arXiv:2104.14682  [pdf, other

    cs.CV cs.RO

    EagerMOT: 3D Multi-Object Tracking via Sensor Fusion

    Authors: Aleksandr Kim, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. On the other hand, cameras provide a dense and rich visual signal that… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

    Comments: To be published at ICRA 2021. Source code available at https://github.com/aleksandrkim61/EagerMOT

  37. arXiv:2104.11221  [pdf, other

    cs.CV

    Opening up Open-World Tracking

    Authors: Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An autonomous agent that is blind to never-seen-before objects poses a safety hazard when operating in the real world - and yet this is how almost all current systems work. One of the main obstacles towards advancing tracking any object is that this… ▽ More

    Submitted 28 March, 2022; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2022 (Oral). https://openworldtracking.github.io/

  38. arXiv:2103.06818  [pdf, other

    cs.CV

    Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization

    Authors: Aysim Toker, Qunjie Zhou, Maxim Maximov, Laura Leal-Taixé

    Abstract: The goal of cross-view image based geo-localization is to determine the location of a given street view image by matching it against a collection of geo-tagged satellite images. This task is notoriously challenging due to the drastic viewpoint and appearance differences between the two domains. We show that we can address this discrepancy explicitly by learning to synthesize realistic street views… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  39. arXiv:2103.04727  [pdf, other

    cs.LG cs.CV cs.RO

    Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

    Authors: Patrick Wenzel, Torsten Schön, Laura Leal-Taixé, Daniel Cremers

    Abstract: Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots. In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera. In particular, we are interested in solving this problem without relying on localization, map**, or planning techniques. Most of the existing… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA)

  40. arXiv:2102.12472  [pdf, other

    cs.CV cs.RO

    4D Panoptic LiDAR Segmentation

    Authors: Mehmet Aygün, Aljoša Ošep, Mark Weber, Maxim Maximov, Cyrill Stachniss, Jens Behley, Laura Leal-Taixé

    Abstract: Temporal semantic scene understanding is critical for self-driving cars or robots operating in dynamic environments. In this paper, we propose 4D panoptic LiDAR segmentation to assign a semantic class and a temporally-consistent instance ID to a sequence of 3D points. To this end, we present an approach and a point-centric evaluation metric. Our approach determines a semantic class for every point… ▽ More

    Submitted 7 April, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: CVPR 2021

  41. arXiv:2102.11859  [pdf, other

    cs.CV

    STEP: Segmenting and Tracking Every Pixel

    Authors: Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers, Aljoša Ošep, Laura Leal-Taixé, Liang-Chieh Chen

    Abstract: The task of assigning semantic classes and track identities to every pixel in a video is called video panoptic segmentation. Our work is the first that targets this task in a real-world setting requiring dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sp… ▽ More

    Submitted 7 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted to NeurIPS 2021 Track on Datasets and Benchmarks. Code: https://github.com/google-research/deeplab2

  42. arXiv:2102.07753  [pdf, other

    cs.CV

    Learning Intra-Batch Connections for Deep Metric Learning

    Authors: Jenny Seidenschwarz, Ismail Elezi, Laura Leal-Taixé

    Abstract: The goal of metric learning is to learn a function that maps samples to a lower-dimensional space where similar samples lie closer than dissimilar ones. Particularly, deep metric learning utilizes neural networks to learn such a map**. Most approaches rely on losses that only take the relations between pairs or triplets of samples into account, which either belong to the same class or two differ… ▽ More

    Submitted 11 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2021, includes non-archival supplementary material

  43. arXiv:2101.02702  [pdf, other

    cs.CV

    TrackFormer: Multi-Object Tracking with Transformers

    Authors: Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, Christoph Feichtenhofer

    Abstract: The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture. Our model achieves data association between frames via atten… ▽ More

    Submitted 29 April, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

  44. arXiv:2012.01909  [pdf, other

    cs.CV

    Patch2Pix: Epipolar-Guided Pixel-Level Correspondences

    Authors: Qunjie Zhou, Torsten Sattler, Laura Leal-Taixe

    Abstract: The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottleneck. In this work, we propose a new perspective t… ▽ More

    Submitted 26 March, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: CVPR2021 Camera Ready Version

  45. arXiv:2012.01866  [pdf, other

    cs.CV cs.LG cs.RO

    Make One-Shot Video Object Segmentation Efficient Again

    Authors: Tim Meinhardt, Laura Leal-Taixe

    Abstract: Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  46. arXiv:2010.15261  [pdf, other

    cs.CV

    Deep Shells: Unsupervised Shape Correspondence with Optimal Transport

    Authors: Marvin Eisenberger, Aysim Toker, Laura Leal-Taixé, Daniel Cremers

    Abstract: We propose a novel unsupervised learning approach to 3D shape correspondence that builds a multiscale matching pipeline into a deep neural network. This approach is based on smooth shells, the current state-of-the-art axiomatic correspondence method, which requires an a priori stochastic search over the space of initial poses. Our goal is to replace this costly preprocessing step by directly learn… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

  47. arXiv:2010.14300  [pdf, other

    cs.CV cs.AI

    Ice Monitoring in Swiss Lakes from Optical Satellites and Webcams using Machine Learning

    Authors: Manu Tom, Rajanie Prabha, Tianyu Wu, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler

    Abstract: Continuous observation of climate indicators, such as trends in lake freezing, is important to understand the dynamics of the local and global climate system. Consequently, lake ice has been included among the Essential Climate Variables (ECVs) of the Global Climate Observing System (GCOS), and there is a need to set up operational monitoring capabilities. Multi-temporal satellite images and publi… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in MDPI Remote Sensing Journal

  48. arXiv:2010.07548  [pdf, other

    cs.CV

    MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

    Authors: Patrick Dendorfer, Aljoša Ošep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, Laura Leal-Taixé

    Abstract: Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched… ▽ More

    Submitted 8 December, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at IJCV

  49. arXiv:2010.01114  [pdf, other

    cs.CV

    Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation

    Authors: Patrick Dendorfer, Aljoša Ošep, Laura Leal-Taixé

    Abstract: In this paper, we present Goal-GAN, an interpretable and end-to-end trainable model for human trajectory prediction. Inspired by human navigation, we model the task of trajectory prediction as an intuitive two-stage process: (i) goal estimation, which predicts the most likely target positions of the agent, followed by a (ii) routing module which estimates a set of plausible trajectories that route… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Oral presentation at ACCV 2020

  50. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking

    Authors: Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, Bastian Leibe

    Abstract: Multi-Object Tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, HOTA (Higher Order Tracking Accuracy), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing track… ▽ More

    Submitted 29 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Pre-print. Accepted for Publication in the International Journal of Computer Vision, 19 August 2020. Code is available at https://github.com/JonathonLuiten/HOTA-metrics

    Journal ref: International Journal of Computer Vision (2020)