Skip to main content

Showing 1–50 of 79 results for author: Leibe, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11835  [pdf, other

    cs.CV

    OoDIS: Anomaly Instance Segmentation Benchmark

    Authors: Alexey Nekrasov, Rui Zhou, Miriam Ackermann, Alexander Hermans, Bastian Leibe, Matthias Rottmann

    Abstract: Autonomous vehicles require a precise understanding of their environment to navigate safely. Reliable identification of unknown objects, especially those that are absent during training, such as wild animals, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been driven by the availability of out-of-distribution (OOD) benc… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at the VAND 2.0 Workshop at CVPR 2024. Project page: https://vision.rwth-aachen.de/oodis

  2. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung ** Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2402.05917  [pdf, other

    cs.CV

    Point-VOS: Pointing Up Video Object Segmentation

    Authors: Idil Esen Zulfikar, Sabarinath Mahadevan, Paul Voigtlaender, Bastian Leibe

    Abstract: Current state-of-the-art Video Object Segmentation (VOS) methods rely on dense per-object mask annotations both during training and testing. This requires time-consuming and costly video annotation mechanisms. We propose a novel Point-VOS task with a spatio-temporally sparse point-wise annotation scheme that substantially reduces the annotation effort. We apply our annotation scheme to two large-s… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024!

  4. arXiv:2402.05685  [pdf, ps, other

    cs.CV

    An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs

    Authors: Patrick Wienholt, Alexander Hermans, Firas Khader, Behrus Puladi, Bastian Leibe, Christiane Kuhl, Sven Nebelung, Daniel Truhn

    Abstract: This study investigates the application of ordinal regression methods for categorizing disease severity in chest radiographs. We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, and a classification function. Different encoding methods, including one-hot, Gaussian, progress-bar, and our soft-progress-bar, are applied using ResNet50 and V… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 17 pages, 3 figures, the code is available at: https://github.com/paddyOnGithub/ordinal_regression

  5. Cyto R-CNN and CytoNuke Dataset: Towards reliable whole-cell segmentation in bright-field histological images

    Authors: Johannes Raufeisen, Kunpeng Xie, Fabian Hörst, Till Braunschweig, Jianning Li, Jens Kleesiek, Rainer Röhrig, Jan Egger, Bastian Leibe, Frank Hölzle, Alexander Hermans, Behrus Puladi

    Abstract: Background: Cell segmentation in bright-field histological slides is a crucial topic in medical image analysis. Having access to accurate segmentation allows researchers to examine the relationship between cellular morphology and clinical observations. Unfortunately, most segmentation methods known today are limited to nuclei and cannot segmentate the cytoplasm. Material & Methods: We present a… ▽ More

    Submitted 4 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  6. arXiv:2312.05208  [pdf, other

    cs.CV

    ControlRoom3D: Room Generation using Semantic Proxy Rooms

    Authors: Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou

    Abstract: Manually creating 3D environments for AR/VR applications is a complex process requiring expert knowledge in 3D modeling software. Pioneering works facilitate this process by generating room meshes conditioned on textual style descriptions. Yet, many of these automatically generated 3D meshes do not adhere to typical room layouts, compromising their plausibility, e.g., by placing several beds in on… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Project Page: https://jonasschult.github.io/ControlRoom3D/

  7. arXiv:2311.06391  [pdf, ps, other

    cs.RO

    BUSSARD -- Better Understanding Social Situations for Autonomous Robot Decision-Making

    Authors: Stefan Schiffer, Astrid Rosenthal-von der Pütten, Bastian Leibe

    Abstract: We report on our effort to create a corpus dataset of different social context situations in an office setting for further disciplinary and interdisciplinary research in computer vision, psychology, and human-robot-interaction. For social robots to be able to behave appropriately, they need to be aware of the social context they act in. Consider, for example, a robot with the task to deliver a per… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: In SCRITA 2023 Workshop Proceedings (arXiv:2311.05401) held in conjunction with 32nd IEEE International Conference on Robot & Human Interactive Communication, 28/08 - 31/08 2023, Busan (Korea)

    Report number: SCRITA/2023/07

  8. arXiv:2309.16133  [pdf, other

    cs.CV

    Mask4Former: Mask Transformer for 4D Panoptic Segmentation

    Authors: Kadir Yilmaz, Jonas Schult, Alexey Nekrasov, Bastian Leibe

    Abstract: Accurately perceiving and tracking instances over time is essential for the decision-making processes of autonomous agents interacting safely in dynamic environments. With this intention, we propose Mask4Former for the challenging task of 4D panoptic segmentation of LiDAR point clouds. Mask4Former is the first transformer-based approach unifying semantic instance segmentation and tracking of spars… ▽ More

    Submitted 10 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Renamed from MASK4D to Mask4Former. ICRA 2024. Project page: https://vision.rwth-aachen.de/Mask4Former

  9. arXiv:2308.09713  [pdf, other

    cs.CV

    Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

    Authors: Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan

    Abstract: We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we all… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  10. arXiv:2308.02046  [pdf, other

    cs.CV

    UGainS: Uncertainty Guided Anomaly Instance Segmentation

    Authors: Alexey Nekrasov, Alexander Hermans, Lars Kuhnert, Bastian Leibe

    Abstract: A single unexpected object on the road can cause an accident or may lead to injuries. To prevent this, we need a reliable mechanism for finding anomalous objects on the road. This task, called anomaly segmentation, can be a step** stone to safe and reliable autonomous driving. Current approaches tackle anomaly segmentation by assigning an anomaly score to each pixel and by grou** anomalous reg… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at GCPR 2023; Project page at https://vision.rwth-aachen.de/ugains

  11. arXiv:2306.00977  [pdf, other

    cs.CV cs.HC

    AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

    Authors: Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult, Francis Engelmann, Bastian Leibe, Konrad Schindler, Theodora Kontogianni

    Abstract: During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments obj… ▽ More

    Submitted 10 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 camera-ready. Project page: https://ywyue.github.io/AGILE3D

  12. arXiv:2304.06668  [pdf, other

    cs.CV

    DynaMITe: Dynamic Query Bootstrap** for Multi-object Interactive Segmentation Transformer

    Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe

    Abstract: Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interactio… ▽ More

    Submitted 22 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted to ICCV 2023

  13. arXiv:2303.16570  [pdf, other

    cs.CV

    Point2Vec for Self-Supervised Representation Learning on Point Clouds

    Authors: Karim Abou Zeid, Jonas Schult, Alexander Hermans, Bastian Leibe

    Abstract: Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-… ▽ More

    Submitted 11 October, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted at GCPR 2023. Project page at https://vision.rwth-aachen.de/point2vec

  14. arXiv:2301.02657  [pdf, other

    cs.CV cs.AI cs.LG

    TarViS: A Unified Approach for Target-based Video Segmentation

    Authors: Ali Athar, Alexander Hermans, Jonathon Luiten, Deva Ramanan, Bastian Leibe

    Abstract: The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied t… ▽ More

    Submitted 10 May, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

    Comments: Accepted to CVPR'23 (Highlight). Code is available at: https://github.com/Ali2500/TarViS

    ACM Class: I.4.6; I.4.8; I.4.10

  15. arXiv:2212.14474  [pdf, other

    cs.CV

    Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

    Authors: István Sárándi, Alexander Hermans, Bastian Leibe

    Abstract: Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise on… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: Accepted at the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'23)

    ACM Class: I.2.10; I.4.8

  16. arXiv:2212.00786  [pdf, other

    cs.CV

    3D Segmentation of Humans in Point Clouds with Synthetic Data

    Authors: Ayça Takmaz, Jonas Schult, Irem Kaftan, Mertcan Akçay, Bastian Leibe, Robert Sumner, Francis Engelmann, Siyu Tang

    Abstract: Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated train… ▽ More

    Submitted 18 August, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: project page: https://human-3d.github.io/

  17. arXiv:2210.03105  [pdf, other

    cs.CV

    Mask3D: Mask Transformer for 3D Semantic Instance Segmentation

    Authors: Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe

    Abstract: Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic T… ▽ More

    Submitted 12 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICRA 2023 camera-ready version

  18. arXiv:2209.14858  [pdf, other

    cs.CV

    4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation

    Authors: Lars Kreuzberg, Idil Esen Zulfikar, Sabarinath Mahadevan, Francis Engelmann, Bastian Leibe

    Abstract: In this work, we present a new paradigm, called 4D-StOP, to tackle the task of 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporal proposals using voting-based center predictions, where each point in the 4D volume votes for a corresponding center. These tracklet proposals are further aggregated using learned geometric features. The tracklet aggregation method effectively genera… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted to the ECCV 2022 AVVision Workshop

    Journal ref: European Conference on Computer Vision Workshops 2022

  19. arXiv:2209.12118  [pdf, other

    cs.CV

    BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

    Authors: Ali Athar, Jonathon Luiten, Paul Voigtlaender, Tarasha Khurana, Achal Dave, Bastian Leibe, Deva Ramanan

    Abstract: Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to eac… ▽ More

    Submitted 22 November, 2022; v1 submitted 24 September, 2022; originally announced September 2022.

  20. arXiv:2208.03791  [pdf, other

    cs.CV

    Global Hierarchical Attention for 3D Point Cloud Analysis

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: Accepted to the German Conference on Pattern Recognition (GCPR) 2022

  21. arXiv:2208.02121  [pdf, other

    cs.RO cs.CV cs.HC

    Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics

    Authors: Diego Paez-Granados, Yujie He, David Gonon, Dan Jia, Bastian Leibe, Kenji Suzuki, Aude Billard

    Abstract: Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation me… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: \c{opyright}IEEE All rights reserved. IEEE-IROS-2022, Oct.23-27. Kyoto, Japan

    Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2022)

  22. arXiv:2206.00182  [pdf, other

    cs.CV

    Differentiable Soft-Masked Attention

    Authors: Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

    Abstract: Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object… ▽ More

    Submitted 5 August, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2112.09131

    ACM Class: I.4.6; I.4.8; I.4.10

  23. HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

    Authors: Ali Athar, Jonathon Luiten, Alexander Hermans, Deva Ramanan, Bastian Leibe

    Abstract: Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles… ▽ More

    Submitted 15 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    ACM Class: I.4.6; I.4.8; I.4.10

  24. arXiv:2111.07774  [pdf, other

    cs.CV

    D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

    Authors: Christian Schmidt, Ali Athar, Sabarinath Mahadevan, Bastian Leibe

    Abstract: Despite receiving significant attention from the research community, the task of segmenting and tracking objects in monocular videos still has much room for improvement. Existing works have simultaneously justified the efficacy of dilated and deformable convolutions for various image-level segmentation tasks. This gives reason to believe that 3D extensions of such convolutions should also yield pe… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: Accepted to WACV 2022

  25. arXiv:2110.02210  [pdf, other

    cs.CV

    Mix3D: Out-of-Context Data Augmentation for 3D Scenes

    Authors: Alexey Nekrasov, Jonas Schult, Or Litany, Bastian Leibe, Francis Engelmann

    Abstract: We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for… ▽ More

    Submitted 29 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at 3DV 2021. Camera-ready submission. Link to code: https://github.com/kumuji/mix3d - Project page: https://nekrasov.dev/mix3d/

  26. arXiv:2107.06780  [pdf, ps, other

    cs.CV cs.RO

    Person-MinkUNet: 3D Person Detection with LiDAR Point Cloud

    Authors: Dan Jia, Bastian Leibe

    Abstract: In this preliminary work we attempt to apply submanifold sparse convolution to the task of 3D person detection. In particular, we present Person-MinkUNet, a single-stage 3D person detection network based on Minkowski Engine with U-Net architecture. The network achieves a 76.4% average precision (AP) on the JRDB 3D detection benchmark.

    Submitted 3 July, 2021; originally announced July 2021.

    Comments: accepted as an extended abstract in JRDB-ACT Workshop at CVPR21

  27. arXiv:2106.11239  [pdf, other

    cs.RO cs.CV

    2D vs. 3D LiDAR-based Person Detection on Mobile Robots

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: Person detection is a crucial task for mobile robots navigating in human-populated environments. LiDAR sensors are promising for this task, thanks to their accurate depth measurements and large field of view. Two types of LiDAR sensors exist: the 2D LiDAR sensors, which scan a single plane, and the 3D LiDAR sensors, which scan multiple planes, thus forming a volume. How do they compare for the tas… ▽ More

    Submitted 25 July, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Shortened version accepted at the International Conference on Intelligent Robots and Systems (IROS) 2022

  28. arXiv:2104.11221  [pdf, other

    cs.CV

    Opening up Open-World Tracking

    Authors: Yang Liu, Idil Esen Zulfikar, Jonathon Luiten, Achal Dave, Deva Ramanan, Bastian Leibe, Aljoša Ošep, Laura Leal-Taixé

    Abstract: Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An autonomous agent that is blind to never-seen-before objects poses a safety hazard when operating in the real world - and yet this is how almost all current systems work. One of the main obstacles towards advancing tracking any object is that this… ▽ More

    Submitted 28 March, 2022; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: CVPR 2022 (Oral). https://openworldtracking.github.io/

  29. arXiv:2102.11859  [pdf, other

    cs.CV

    STEP: Segmenting and Tracking Every Pixel

    Authors: Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers, Aljoša Ošep, Laura Leal-Taixé, Liang-Chieh Chen

    Abstract: The task of assigning semantic classes and track identities to every pixel in a video is called video panoptic segmentation. Our work is the first that targets this task in a real-world setting requiring dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sp… ▽ More

    Submitted 7 December, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: Accepted to NeurIPS 2021 Track on Datasets and Benchmarks. Code: https://github.com/google-research/deeplab2

  30. arXiv:2012.11575  [pdf, other

    cs.CV

    From Points to Multi-Object 3D Reconstruction

    Authors: Francis Engelmann, Konstantinos Rematas, Bastian Leibe, Vittorio Ferrari

    Abstract: We propose a method to detect and reconstruct multiple 3D objects from a single RGB image. The key idea is to optimize for detection, alignment and shape jointly over all objects in the RGB image, while focusing on realistic and physically plausible reconstructions. To this end, we propose a keypoint detector that localizes objects as center points and directly predicts all object properties, incl… ▽ More

    Submitted 21 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: CVPR2021 - Project Page: https://francisengelmann.github.io/points2objects/

  31. arXiv:2012.08890  [pdf, other

    cs.CV cs.RO

    Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

    Authors: Dan Jia, Mats Steinweg, Alexander Hermans, Bastian Leibe

    Abstract: Deep learning is the essential building block of state-of-the-art person detectors in 2D range data. However, only a few annotated datasets are available for training and testing these deep networks, potentially limiting their performance when deployed in new environments or with different LiDAR models. We propose a method, which uses bounding boxes from an image-based detector (e.g. Faster R-CNN)… ▽ More

    Submitted 3 June, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: 2021 IEEE International Conference on Robotics and Automation (ICRA)

  32. arXiv:2011.01142  [pdf, other

    cs.CV

    Reducing the Annotation Effort for Video Object Segmentation Datasets

    Authors: Paul Voigtlaender, Lishu Luo, Chun Yuan, Yong Jiang, Bastian Leibe

    Abstract: For further progress in video object segmentation (VOS), larger, more diverse, and more challenging datasets will be necessary. However, densely labeling every frame with pixel masks does not scale to large datasets. We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations and investigate how far such pseudo-labels can ca… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at WACV 2021

  33. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking

    Authors: Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixe, Bastian Leibe

    Abstract: Multi-Object Tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, HOTA (Higher Order Tracking Accuracy), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing track… ▽ More

    Submitted 29 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Pre-print. Accepted for Publication in the International Journal of Computer Vision, 19 August 2020. Code is available at https://github.com/JonathonLuiten/HOTA-metrics

    Journal ref: International Journal of Computer Vision (2020)

  34. arXiv:2008.11516  [pdf, other

    cs.CV

    Making a Case for 3D Convolutions for Object Segmentation in Videos

    Authors: Sabarinath Mahadevan, Ali Athar, Aljoša Ošep, Sebastian Hennen, Laura Leal-Taixé, Bastian Leibe

    Abstract: The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using standard 2D convolutional networks, followed by a learned fusion of the two sources of information. On the other hand, 3D convolutional networks have been successfully applied for video classification tasks, but have not been leveraged as effectively to problems involv… ▽ More

    Submitted 1 September, 2023; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: BMVC '20

  35. MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation

    Authors: István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe

    Abstract: Heatmap representations have formed the basis of human pose estimation systems for many years, and their extension to 3D has been a fruitful line of recent research. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and Z to metric depth around the subject. To obtain metric-scale predictions, 2.5D methods need a separate post-processing step to resolve scale ambi… ▽ More

    Submitted 14 November, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: See project page at https://vision.rwth-aachen.de/metrabs . Accepted for publication in the IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM), Special Issue "Selected Best Works From Automated Face and Gesture Recognition 2020". Extended version of FG paper arXiv:2003.02953

    ACM Class: I.2.10; I.4.8

    Journal ref: IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 3, no. 1, pp. 16-30, Jan. 2021

  36. Reposing Humans by War** 3D Features

    Authors: Markus Knoche, István Sárándi, Bastian Leibe

    Abstract: We address the problem of reposing an image of a human into any desired novel pose. This conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Most prior works are either based on 2D representations or require fitting and manipulating an explicit 3D body mesh. Based on the recent success in deep learning-based volumetric repre… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted at CVPR 2020 Workshop on Human-Centric Image/Video Synthesis

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 1044-1045

  37. SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction

    Authors: Francis Engelmann, Jörg Stückler, Bastian Leibe

    Abstract: Inferring the pose and shape of vehicles in 3D from a movable platform still remains a challenging task due to the projective sensing principle of cameras, difficult surface properties e.g. reflections or transparency, and illumination changes between images. In this paper, we propose to use 3D shape and motion priors to regularize the estimation of the trajectory and the shape of vehicles in sequ… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

    Journal ref: IEEE Winter Conference on Applications of Computer Vision (WACV), 2017

  38. arXiv:2004.14079  [pdf, other

    cs.RO cs.CV

    DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data

    Authors: Dan Jia, Alexander Hermans, Bastian Leibe

    Abstract: Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the nec… ▽ More

    Submitted 31 July, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  39. Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos

    Authors: Umer Rafi, Andreas Doering, Bastian Leibe, Juergen Gall

    Abstract: Video annotation is expensive and time consuming. Consequently, datasets for multi-person pose estimation and tracking are less diverse and have more sparse annotations compared to large scale image datasets for human pose estimation. This makes it challenging to learn deep learning based models for associating keypoints across frames that are robust to nuisance factors such as motion blur and occ… ▽ More

    Submitted 15 March, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to ECCV 2020

  40. arXiv:2004.01002  [pdf, other

    cs.CV

    DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes

    Authors: Jonas Schult, Francis Engelmann, Theodora Kontogianni, Bastian Leibe

    Abstract: We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. The first type, geodesic convolutions, defines the kernel weights over mesh surfaces or graphs. That is, the convolutional kernel weights are mapped to the local surface of a given mesh. The second type, Euclidean convolutions, is independent o… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 camera-ready version

    ACM Class: I.4.6; I.4.10; I.2.6; I.2.10

  41. arXiv:2003.13867  [pdf, other

    cs.CV

    3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation

    Authors: Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner

    Abstract: We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-propo… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: CVPR2020, Video: https://youtu.be/ifL8yTbRFDk Project Page: https://www.vision.rwth-aachen.de/3d_instance_segmentation/

  42. STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

    Authors: Ali Athar, Sabarinath Mahadevan, Aljoša Ošep, Laura Leal-Taixé, Bastian Leibe

    Abstract: Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in individual frames, and then associate these detections over time. Hence, these methods are often non-end-to-end trainable and highly tailored to specific tasks. In… ▽ More

    Submitted 1 September, 2023; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 28 pages, 6 figures

    MSC Class: 68T45; 68T10; 62H30 ACM Class: I.2.10; I.4.6; I.4.8; I.5.3

  43. Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation

    Authors: István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe

    Abstract: Heatmap representations have formed the basis of 2D human pose estimation systems for many years, but their generalizations for 3D pose have only recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and the Z axis to metric depth around the subject. To obtain metric-scale predictions, these methods must include a separate, explicit post-pro… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at the 2020 IEEE Conference on Automatic Face and Gesture Recognition (FG)

  44. arXiv:2001.05425  [pdf, other

    cs.CV

    UnOVOST: Unsupervised Offline Video Object Segmentation and Tracking

    Authors: Jonathon Luiten, Idil Esen Zulfikar, Bastian Leibe

    Abstract: We address Unsupervised Video Object Segmentation (UVOS), the task of automatically generating accurate pixel masks for salient objects in a video sequence and of tracking these objects consistently through time, without any input about which objects should be tracked. Towards solving this task, we present UnOVOST (Unsupervised Offline Video Object Segmentation and Tracking) as a simple and generi… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted for publication at WACV 2020

  45. arXiv:1911.12836  [pdf, other

    cs.CV

    Siam R-CNN: Visual Tracking by Re-Detection

    Authors: Paul Voigtlaender, Jonathon Luiten, Philip H. S. Torr, Bastian Leibe

    Abstract: We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and… ▽ More

    Submitted 2 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: CVPR 2020 camera-ready version

  46. arXiv:1911.00764  [pdf, other

    cs.CV cs.RO

    Single-Shot Panoptic Segmentation

    Authors: Mark Weber, Jonathon Luiten, Bastian Leibe

    Abstract: We present a novel end-to-end single-shot method that segments countable object instances (things) as well as background regions (stuff) into a non-overlap** panoptic segmentation at almost video frame rate. Current state-of-the-art methods are far from reaching video frame rate and mostly rely on merging instance segmentation with semantic background segmentation, making them impractical to use… ▽ More

    Submitted 30 August, 2020; v1 submitted 2 November, 2019; originally announced November 2019.

    Comments: Accepted to IROS 2020

  47. arXiv:1910.04668  [pdf, other

    cs.CV cs.RO

    AlignNet-3D: Fast Point Cloud Registration of Partially Observed Objects

    Authors: Johannes Groß, Aljosa Osep, Bastian Leibe

    Abstract: Methods tackling multi-object tracking need to estimate the number of targets in the sensing area as well as to estimate their continuous state. While the majority of existing methods focus on data association, precise state (3D pose) estimation is often only coarsely estimated by approximating targets with centroids or (3D) bounding boxes. However, in automotive scenarios, motion perception of su… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Presented at 3DV'19

  48. Track to Reconstruct and Reconstruct to Track

    Authors: Jonathon Luiten, Tobias Fischer, Bastian Leibe

    Abstract: Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic ob… ▽ More

    Submitted 19 April, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: RA-L 2020 and ICRA 2020

    Journal ref: IEEE Robotics and Automation Letters 5.2 (2020): 1803-1810

  49. arXiv:1907.12046  [pdf, other

    cs.CV

    Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds

    Authors: Francis Engelmann, Theodora Kontogianni, Bastian Leibe

    Abstract: In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representations such as point clouds or graphs. However, we observe that… ▽ More

    Submitted 23 May, 2020; v1 submitted 28 July, 2019; originally announced July 2019.

    Comments: ICRA 2020 Video https://www.youtube.com/watch?v=JDfFmuOvMkM Project https://francisengelmann.github.io/DPC/

  50. Visual Person Understanding through Multi-Task and Multi-Dataset Learning

    Authors: Kilian Pfeiffer, Alexander Hermans, István Sárándi, Mark Weber, Bastian Leibe

    Abstract: We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which is valuable for many applications. This is a classical multi-task learning problem. However, no dataset exists that these tasks could be jointly learned from. H… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.