-
OoDIS: Anomaly Instance Segmentation Benchmark
Authors:
Alexey Nekrasov,
Rui Zhou,
Miriam Ackermann,
Alexander Hermans,
Bastian Leibe,
Matthias Rottmann
Abstract:
Autonomous vehicles require a precise understanding of their environment to navigate safely. Reliable identification of unknown objects, especially those that are absent during training, such as wild animals, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been driven by the availability of out-of-distribution (OOD) benc…
▽ More
Autonomous vehicles require a precise understanding of their environment to navigate safely. Reliable identification of unknown objects, especially those that are absent during training, such as wild animals, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been driven by the availability of out-of-distribution (OOD) benchmarks. However, a comprehensive understanding of scene dynamics requires the segmentation of individual objects, and thus the segmentation of instances is essential. Development in this area has been lagging, largely due to the lack of dedicated benchmarks. To address this gap, we have extended the most commonly used anomaly segmentation benchmarks to include the instance segmentation task. Our evaluation of anomaly instance segmentation methods shows that this challenge remains an unsolved problem. The benchmark website and the competition page can be found at: https://vision.rwth-aachen.de/oodis .
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The PLATO Mission
Authors:
Heike Rauer,
Conny Aerts,
Juan Cabrera,
Magali Deleuil,
Anders Erikson,
Laurent Gizon,
Mariejo Goupil,
Ana Heras,
Jose Lorenzo-Alvarez,
Filippo Marliani,
Cesar Martin-Garcia,
J. Miguel Mas-Hesse,
Laurence O'Rourke,
Hugh Osborn,
Isabella Pagano,
Giampaolo Piotto,
Don Pollacco,
Roberto Ragazzoni,
Gavin Ramsay,
Stéphane Udry,
Thierry Appourchaux,
Willy Benz,
Alexis Brandeker,
Manuel Güdel,
Eduardo Janot-Pacheco
, et al. (801 additional authors not shown)
Abstract:
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observati…
▽ More
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observations from the ground, planets will be characterised for their radius, mass, and age with high accuracy (5 %, 10 %, 10 % for an Earth-Sun combination respectively). PLATO will provide us with a large-scale catalogue of well-characterised small planets up to intermediate orbital periods, relevant for a meaningful comparison to planet formation theories and to better understand planet evolution. It will make possible comparative exoplanetology to place our Solar System planets in a broader context. In parallel, PLATO will study (host) stars using asteroseismology, allowing us to determine the stellar properties with high accuracy, substantially enhancing our knowledge of stellar structure and evolution.
The payload instrument consists of 26 cameras with 12cm aperture each. For at least four years, the mission will perform high-precision photometric measurements. Here we review the science objectives, present PLATO's target samples and fields, provide an overview of expected core science performance as well as a description of the instrument and the mission profile at the beginning of the serial production of the flight cameras. PLATO is scheduled for a launch date end 2026. This overview therefore provides a summary of the mission to the community in preparation of the upcoming operational phases.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs
Authors:
Patrick Wienholt,
Alexander Hermans,
Firas Khader,
Behrus Puladi,
Bastian Leibe,
Christiane Kuhl,
Sven Nebelung,
Daniel Truhn
Abstract:
This study investigates the application of ordinal regression methods for categorizing disease severity in chest radiographs. We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, and a classification function. Different encoding methods, including one-hot, Gaussian, progress-bar, and our soft-progress-bar, are applied using ResNet50 and V…
▽ More
This study investigates the application of ordinal regression methods for categorizing disease severity in chest radiographs. We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, and a classification function. Different encoding methods, including one-hot, Gaussian, progress-bar, and our soft-progress-bar, are applied using ResNet50 and ViT-B-16 deep learning models. We show that the choice of encoding has a strong impact on performance and that the best encoding depends on the chosen weighting of Cohen's kappa and also on the model architecture used. We make our code publicly available on GitHub.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Cyto R-CNN and CytoNuke Dataset: Towards reliable whole-cell segmentation in bright-field histological images
Authors:
Johannes Raufeisen,
Kunpeng Xie,
Fabian Hörst,
Till Braunschweig,
Jianning Li,
Jens Kleesiek,
Rainer Röhrig,
Jan Egger,
Bastian Leibe,
Frank Hölzle,
Alexander Hermans,
Behrus Puladi
Abstract:
Background: Cell segmentation in bright-field histological slides is a crucial topic in medical image analysis. Having access to accurate segmentation allows researchers to examine the relationship between cellular morphology and clinical observations. Unfortunately, most segmentation methods known today are limited to nuclei and cannot segmentate the cytoplasm.
Material & Methods: We present a…
▽ More
Background: Cell segmentation in bright-field histological slides is a crucial topic in medical image analysis. Having access to accurate segmentation allows researchers to examine the relationship between cellular morphology and clinical observations. Unfortunately, most segmentation methods known today are limited to nuclei and cannot segmentate the cytoplasm.
Material & Methods: We present a new network architecture Cyto R-CNN that is able to accurately segment whole cells (with both the nucleus and the cytoplasm) in bright-field images. We also present a new dataset CytoNuke, consisting of multiple thousand manual annotations of head and neck squamous cell carcinoma cells. Utilizing this dataset, we compared the performance of Cyto R-CNN to other popular cell segmentation algorithms, including QuPath's built-in algorithm, StarDist and Cellpose. To evaluate segmentation performance, we calculated AP50, AP75 and measured 17 morphological and staining-related features for all detected cells. We compared these measurements to the gold standard of manual segmentation using the Kolmogorov-Smirnov test.
Results: Cyto R-CNN achieved an AP50 of 58.65% and an AP75 of 11.56% in whole-cell segmentation, outperforming all other methods (QuPath $19.46/0.91\%$; StarDist $45.33/2.32\%$; Cellpose $31.85/5.61\%$). Cell features derived from Cyto R-CNN showed the best agreement to the gold standard ($\bar{D} = 0.15$) outperforming QuPath ($\bar{D} = 0.22$), StarDist ($\bar{D} = 0.25$) and Cellpose ($\bar{D} = 0.23$).
Conclusion: Our newly proposed Cyto R-CNN architecture outperforms current algorithms in whole-cell segmentation while providing more reliable cell measurements than any other model. This could improve digital pathology workflows, potentially leading to improved diagnosis. Moreover, our published dataset can be used to develop further models in the future.
△ Less
Submitted 4 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
UGainS: Uncertainty Guided Anomaly Instance Segmentation
Authors:
Alexey Nekrasov,
Alexander Hermans,
Lars Kuhnert,
Bastian Leibe
Abstract:
A single unexpected object on the road can cause an accident or may lead to injuries. To prevent this, we need a reliable mechanism for finding anomalous objects on the road. This task, called anomaly segmentation, can be a step** stone to safe and reliable autonomous driving. Current approaches tackle anomaly segmentation by assigning an anomaly score to each pixel and by grou** anomalous reg…
▽ More
A single unexpected object on the road can cause an accident or may lead to injuries. To prevent this, we need a reliable mechanism for finding anomalous objects on the road. This task, called anomaly segmentation, can be a step** stone to safe and reliable autonomous driving. Current approaches tackle anomaly segmentation by assigning an anomaly score to each pixel and by grou** anomalous regions using simple heuristics. However, pixel grou** is a limiting factor when it comes to evaluating the segmentation performance of individual anomalous objects. To address the issue of grou** multiple anomaly instances into one, we propose an approach that produces accurate anomaly instance masks. Our approach centers on an out-of-distribution segmentation model for identifying uncertain regions and a strong generalist segmentation model for anomaly instances segmentation. We investigate ways to use uncertain regions to guide such a segmentation model to perform segmentation of anomalous instances. By incorporating strong object priors from a generalist model we additionally improve the per-pixel anomaly segmentation performance. Our approach outperforms current pixel-level anomaly segmentation methods, achieving an AP of 80.08% and 88.98% on the Fishyscapes Lost and Found and the RoadAnomaly validation sets respectively. Project page: https://vision.rwth-aachen.de/ugains
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Initial radiometric calibration of the High-Resolution EUV Imager ($\textrm{HRI}_\textrm{EUV}$) of the Extreme Ultraviolet Imager (EUI) instrument onboard Solar Orbiter
Authors:
S. Gissot,
F. Auchère,
D. Berghmans,
B. Giordanengo,
A. BenMoussa,
J. Rebellato,
L. Harra,
D. Long,
P. Rochus,
U. Schühle,
R. Aznar Cuadrado,
F. Delmotte,
C. Dumesnil,
A. Gottwald,
J. -P. Halain,
K. Heerlein,
M. -L. Hellin,
A. Hermans,
L. Jacques,
E. Kraaikamp,
R. Mercier,
P. Rochus,
P. J. Smith,
L. Teriaca,
C. Verbeeck
Abstract:
The $\textrm{HRI}_\textrm{EUV}$ telescope was calibrated on ground at the Physikalisch-Technische Bundesanstalt (PTB), Germany's national metrology institute, using the Metrology Light Source (MLS) synchrotron in April 2017 during the calibration campaign of the Extreme Ultraviolet Imager (EUI) instrument onboard the Solar Orbiter mission. We use the pre-flight end-to-end calibration and component…
▽ More
The $\textrm{HRI}_\textrm{EUV}$ telescope was calibrated on ground at the Physikalisch-Technische Bundesanstalt (PTB), Germany's national metrology institute, using the Metrology Light Source (MLS) synchrotron in April 2017 during the calibration campaign of the Extreme Ultraviolet Imager (EUI) instrument onboard the Solar Orbiter mission. We use the pre-flight end-to-end calibration and component-level (mirror multilayer coatings, filters, detector) characterization results to establish the beginning-of-life performance of the $\textrm{HRI}_\textrm{EUV}$ telescope which shall serve as a reference for radiometric analysis and monitoring of the telescope in-flight degradation. Calibration activities at component level and end-to-end calibration of the instrument were performed at PTB/MLS synchrotron light source (Berlin, Germany) and the SOLEIL synchrotron. Each component optical property is measured and compared to its semi-empirical model. This pre-flight characterization is used to estimate the parameters of the semi-empirical models. The end-to-end response is measured and validated by comparison with calibration measurements, as well as with its main design performance requirements. The telescope spectral response semi-empirical model is validated by the pre-flight end-to-end ground calibration of the instrument. It is found that $\textrm{HRI}_\textrm{EUV}$ is a highly efficient solar EUV telescope with a peak efficiency superior to 1 e$^-$.ph$^{-1}$), low detector noise ($\approx$ 3 e- rms), low dark current at operating temperature, and a pixel saturation above 120 ke- in low-gain or combined image mode. The ground calibration also confirms a well-modeled spectral selectivity and rejection, and low stray light. The EUI instrument achieves state-of-the-art performance in terms of signal-to-noise and image spatial resolution.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
DynaMITe: Dynamic Query Bootstrap** for Multi-object Interactive Segmentation Transformer
Authors:
Amit Kumar Rana,
Sabarinath Mahadevan,
Alexander Hermans,
Bastian Leibe
Abstract:
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interactio…
▽ More
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.
△ Less
Submitted 22 August, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Authors:
Karim Abou Zeid,
Jonas Schult,
Alexander Hermans,
Bastian Leibe
Abstract:
Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-…
▽ More
Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
△ Less
Submitted 11 October, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
TarViS: A Unified Approach for Target-based Video Segmentation
Authors:
Ali Athar,
Alexander Hermans,
Jonathon Luiten,
Deva Ramanan,
Bastian Leibe
Abstract:
The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied t…
▽ More
The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET). Our unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two. Code and model weights are available at: https://github.com/Ali2500/TarViS
△ Less
Submitted 10 May, 2023; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
Authors:
István Sárándi,
Alexander Hermans,
Bastian Leibe
Abstract:
Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise on…
▽ More
Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
Authors:
Jonas Schult,
Francis Engelmann,
Alexander Hermans,
Or Litany,
Siyu Tang,
Bastian Leibe
Abstract:
Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic T…
▽ More
Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. In our model called Mask3D each object instance is represented as an instance query. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Combined with point features, the instance queries directly yield all instance masks in parallel. Mask3D has several advantages over current state-of-the-art approaches, since it neither relies on (1) voting schemes which require hand-selected geometric properties (such as centers) nor (2) geometric grou** mechanisms requiring manually-tuned hyper-parameters (e.g. radii) and (3) enables a loss that directly optimizes instance masks. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP).
△ Less
Submitted 12 April, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Scalable photonic integrated circuits for programmable control of atomic systems
Authors:
Adrian J Menssen,
Artur Hermans,
Ian Christen,
Thomas Propson,
Chao Li,
Andrew J Leenheer,
Matthew Zimmermann,
Mark Dong,
Hugo Larocque,
Hamza Raniwala,
Gerald Gilbert,
Matt Eichenfield,
Dirk R Englund
Abstract:
Advances in laser technology have driven discoveries in atomic, molecular, and optical (AMO) physics and emerging applications, from quantum computers with cold atoms or ions, to quantum networks with solid-state color centers. This progress is motivating the development of a new generation of "programmable optical control" systems, characterized by criteria (C1) visible (VIS) and near-infrared (I…
▽ More
Advances in laser technology have driven discoveries in atomic, molecular, and optical (AMO) physics and emerging applications, from quantum computers with cold atoms or ions, to quantum networks with solid-state color centers. This progress is motivating the development of a new generation of "programmable optical control" systems, characterized by criteria (C1) visible (VIS) and near-infrared (IR) wavelength operation, (C2) large channel counts extensible beyond 1000s of individually addressable atoms, (C3) high intensity modulation extinction and (C4) repeatability compatible with low gate errors, and (C5) fast switching times. Here, we address these challenges by introducing an atom control architecture based on VIS-IR photonic integrated circuit (PIC) technology. Based on a complementary metal-oxide-semiconductor (CMOS) fabrication process, this Atom-control PIC (APIC) technology meets the system requirements (C1)-(C5). As a proof of concept, we demonstrate a 16-channel silicon nitride based APIC with (5.8$\pm$0.4) ns response times and -30 dB extinction ratio at a wavelength of 780 nm. This work demonstrates the suitability of PIC technology for quantum control, opening a path towards scalable quantum information processing based on optically-programmable atomic systems.
△ Less
Submitted 7 October, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Global Hierarchical Attention for 3D Point Cloud Analysis
Authors:
Dan Jia,
Alexander Hermans,
Bastian Leibe
Abstract:
We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point…
▽ More
We propose a new attention mechanism, called Global Hierarchical Attention (GHA), for 3D point cloud analysis. GHA approximates the regular global dot-product attention via a series of coarsening and interpolation operations over multiple hierarchy levels. The advantage of GHA is two-fold. First, it has linear complexity with respect to the number of points, enabling the processing of large point clouds. Second, GHA inherently possesses the inductive bias to focus on spatially close points, while retaining the global connectivity among all points. Combined with a feedforward network, GHA can be inserted into many existing network architectures. We experiment with multiple baseline networks and show that adding GHA consistently improves performance across different tasks and datasets. For the task of semantic segmentation, GHA gives a +1.7% mIoU increase to the MinkowskiEngine baseline on ScanNet. For the 3D object detection task, GHA improves the CenterPoint baseline by +0.5% mAP on the nuScenes dataset, and the 3DETR baseline by +2.1% mAP25 and +1.5% mAP50 on ScanNet.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Differentiable Soft-Masked Attention
Authors:
Ali Athar,
Jonathon Luiten,
Alexander Hermans,
Deva Ramanan,
Bastian Leibe
Abstract:
Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object…
▽ More
Transformers have become prevalent in computer vision due to their performance and flexibility in modelling complex operations. Of particular significance is the 'cross-attention' operation, which allows a vector representation (e.g. of an object in an image) to be learned by attending to an arbitrarily sized set of input features. Recently, "Masked Attention" was proposed in which a given object representation only attends to those image pixel features for which the segmentation mask of that object is active. This specialization of attention proved beneficial for various image and video segmentation tasks. In this paper, we propose another specialization of attention which enables attending over `soft-masks' (those with continuous mask probabilities instead of binary values), and is also differentiable through these mask probabilities, thus allowing the mask used for attention to be learned within the network without requiring direct loss supervision. This can be useful for several applications. Specifically, we employ our "Differentiable Soft-Masked Attention" for the task of Weakly-Supervised Video Object Segmentation (VOS), where we develop a transformer-based network for VOS which only requires a single annotated image frame for training, but can also benefit from cycle consistency training on a video with just one annotated frame. Although there is no loss for masks in unlabeled frames, the network is still able to segment objects in those frames due to our novel attention formulation. Code: https://github.com/Ali2500/HODOR/blob/main/hodor/modelling/encoder/soft_masked_attention.py
△ Less
Submitted 5 August, 2022; v1 submitted 31 May, 2022;
originally announced June 2022.
-
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images
Authors:
Ali Athar,
Jonathon Luiten,
Alexander Hermans,
Deva Ramanan,
Bastian Leibe
Abstract:
Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles…
▽ More
Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. We encode object instances and scene information from an image frame into robust high-level descriptors which can then be used to re-segment those objects in different frames. As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks compared to existing methods trained without video annotations. Without any architectural modification, HODOR can also learn from video context around single annotated video frames by utilizing cyclic consistency, whereas other methods rely on dense, temporally consistent annotations. Source code is available at: https://github.com/Ali2500/HODOR
△ Less
Submitted 15 July, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
2D vs. 3D LiDAR-based Person Detection on Mobile Robots
Authors:
Dan Jia,
Alexander Hermans,
Bastian Leibe
Abstract:
Person detection is a crucial task for mobile robots navigating in human-populated environments. LiDAR sensors are promising for this task, thanks to their accurate depth measurements and large field of view. Two types of LiDAR sensors exist: the 2D LiDAR sensors, which scan a single plane, and the 3D LiDAR sensors, which scan multiple planes, thus forming a volume. How do they compare for the tas…
▽ More
Person detection is a crucial task for mobile robots navigating in human-populated environments. LiDAR sensors are promising for this task, thanks to their accurate depth measurements and large field of view. Two types of LiDAR sensors exist: the 2D LiDAR sensors, which scan a single plane, and the 3D LiDAR sensors, which scan multiple planes, thus forming a volume. How do they compare for the task of person detection? To answer this, we conduct a series of experiments, using the public, large-scale JackRabbot dataset and the state-of-the-art 2D and 3D LiDAR-based person detectors (DR-SPAAM and CenterPoint respectively). Our experiments include multiple aspects, ranging from the basic performance and speed comparison, to more detailed analysis on localization accuracy and robustness against distance and scene clutter. The insights from these experiments highlight the strengths and weaknesses of 2D and 3D LiDAR sensors as sources for person detection, and are especially valuable for designing mobile robots that will operate in close proximity to surrounding humans (e.g. service or social robot).
△ Less
Submitted 25 July, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera
Authors:
Dan Jia,
Mats Steinweg,
Alexander Hermans,
Bastian Leibe
Abstract:
Deep learning is the essential building block of state-of-the-art person detectors in 2D range data. However, only a few annotated datasets are available for training and testing these deep networks, potentially limiting their performance when deployed in new environments or with different LiDAR models. We propose a method, which uses bounding boxes from an image-based detector (e.g. Faster R-CNN)…
▽ More
Deep learning is the essential building block of state-of-the-art person detectors in 2D range data. However, only a few annotated datasets are available for training and testing these deep networks, potentially limiting their performance when deployed in new environments or with different LiDAR models. We propose a method, which uses bounding boxes from an image-based detector (e.g. Faster R-CNN) on a calibrated camera to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors. Through experiments on the JackRabbot dataset with two detector models, DROW3 and DR-SPAAM, we show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained only on a different dataset. Combined with robust training techniques, the self-supervised detectors reach a performance close to the ones trained using manual annotations of the target dataset. Our method is an effective way to improve person detectors during deployment without any additional labeling effort, and we release our source code to support relevant robotic applications.
△ Less
Submitted 3 June, 2021; v1 submitted 16 December, 2020;
originally announced December 2020.
-
DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data
Authors:
Dan Jia,
Alexander Hermans,
Bastian Leibe
Abstract:
Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the nec…
▽ More
Detecting persons using a 2D LiDAR is a challenging task due to the low information content of 2D range data. To alleviate the problem caused by the sparsity of the LiDAR points, current state-of-the-art methods fuse multiple previous scans and perform detection using the combined scans. The downside of such a backward looking fusion is that all the scans need to be aligned explicitly, and the necessary alignment operation makes the whole pipeline more expensive -- often too expensive for real-world applications. In this paper, we propose a person detection network which uses an alternative strategy to combine scans obtained at different times. Our method, Distance Robust SPatial Attention and Auto-regressive Model (DR-SPAAM), follows a forward looking paradigm. It keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available. The updated feature template is in turn used for detecting persons currently in the scene. On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster, running at 87.2 FPS on a laptop with a dedicated GPU and at 22.6 FPS on an NVIDIA Jetson AGX embedded GPU. We release our code in PyTorch and a ROS node including pre-trained models.
△ Less
Submitted 31 July, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Assessment of Empathy in an Affective VR Environment using EEG Signals
Authors:
Maryam Alimardani,
Annabella Hermans,
Angelica M. Tinga
Abstract:
With the advancements in social robotics and virtual avatars, it becomes increasingly important that these agents adapt their behavior to the mood, feelings and personality of their users. One such aspect of the user is empathy. Whereas many studies measure empathy through offline measures that are collected after empathic stimulation (e.g. post-hoc questionnaires), the current study aimed to meas…
▽ More
With the advancements in social robotics and virtual avatars, it becomes increasingly important that these agents adapt their behavior to the mood, feelings and personality of their users. One such aspect of the user is empathy. Whereas many studies measure empathy through offline measures that are collected after empathic stimulation (e.g. post-hoc questionnaires), the current study aimed to measure empathy online, using brain activity collected during the experience. Participants watched an affective 360 video of a child experiencing domestic violence in a virtual reality headset while their EEG signals were recorded. Results showed a significant attenuation of alpha, theta and delta asymmetry in the frontal and central areas of the brain. Moreover, a significant relationship between participants' empathy scores and their frontal alpha asymmetry at baseline was found. These results demonstrate specific brain activity alterations when participants are exposed to an affective virtual reality environment, with the level of empathy as a personality trait being visible in brain activity during a baseline measurement. These findings suggest the potential of EEG measurements for development of passive brain-computer interfaces that assess the user's affective responses in real-time and consequently adapt the behavior of socially intelligent agents for a personalized interaction.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Visual Person Understanding through Multi-Task and Multi-Dataset Learning
Authors:
Kilian Pfeiffer,
Alexander Hermans,
István Sárándi,
Mark Weber,
Bastian Leibe
Abstract:
We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which is valuable for many applications. This is a classical multi-task learning problem. However, no dataset exists that these tasks could be jointly learned from. H…
▽ More
We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which is valuable for many applications. This is a classical multi-task learning problem. However, no dataset exists that these tasks could be jointly learned from. Hence several datasets need to be combined during training, which in other contexts has often led to reduced performance in the past. We extensively evaluate how the different task and datasets influence each other and how different degrees of parameter sharing between the tasks affect performance. Our final model matches or outperforms its single-task counterparts without creating significant computational overhead, rendering it highly interesting for resource-constrained scenarios such as mobile robotics.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Deep Person Detection in 2D Range Data
Authors:
Lucas Beyer,
Alexander Hermans,
Timm Linder,
Kai O. Arras,
Bastian Leibe
Abstract:
Detecting humans is a key skill for mobile robots and intelligent vehicles in a large variety of applications. While the problem is well studied for certain sensory modalities such as image data, few works exist that address this detection task using 2D range data. However, a widespread sensory setup for many mobile robots in service and domestic applications contains a horizontally mounted 2D las…
▽ More
Detecting humans is a key skill for mobile robots and intelligent vehicles in a large variety of applications. While the problem is well studied for certain sensory modalities such as image data, few works exist that address this detection task using 2D range data. However, a widespread sensory setup for many mobile robots in service and domestic applications contains a horizontally mounted 2D laser scanner. Detecting people from 2D range data is challenging due to the speed and dynamics of human leg motion and the high levels of occlusion and self-occlusion particularly in crowds of people. While previous approaches mostly relied on handcrafted features, we recently developed the deep learning based wheelchair and walker detector DROW. In this paper, we show the generalization to people, including small modifications that significantly boost DROW's performance. Additionally, by providing a small, fully online temporal window in our network, we further boost our score. We extend the DROW dataset with person annotations, making this the largest dataset of person annotations in 2D range data, recorded during several days in a real-world environment with high diversity. Extensive experiments with three current baseline methods indicate it is a challenging dataset, on which our improved DROW detector beats the current state-of-the-art.
△ Less
Submitted 6 April, 2018;
originally announced April 2018.
-
Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds
Authors:
Francis Engelmann,
Theodora Kontogianni,
Alexander Hermans,
Bastian Leibe
Abstract:
Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured poin…
▽ More
Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured point clouds, achieving encouraging segmentation results. However, it subdivides the input points into a grid of blocks and processes each such block individually. In this paper, we investigate the question how such an architecture can be extended to incorporate larger-scale spatial context. We build upon PointNet and propose two extensions that enlarge the receptive field over the 3D scene. We evaluate the proposed strategies on challenging indoor and outdoor datasets and show improved results in both scenarios.
△ Less
Submitted 18 December, 2019; v1 submitted 5 February, 2018;
originally announced February 2018.
-
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Authors:
Liang-Chieh Chen,
Alexander Hermans,
George Papandreou,
Florian Schroff,
Peng Wang,
Hartwig Adam
Abstract:
In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of objec…
▽ More
In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel's direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models.
△ Less
Submitted 13 December, 2017;
originally announced December 2017.
-
Enhancement of bulk second-harmonic generation from silicon nitride films by material composition
Authors:
K. Koskinen,
R. Czaplicki,
A. Slablab,
T. Ning,
A. Hermans,
B. Kuyken,
V. Mittal,
G. S. Murugan,
T. Niemi,
R. Baets,
M. Kaurenen
Abstract:
We present a comprehensive tensorial characterization of second-harmonic generation from silicon nitride films with varying composition. The samples were fabricated using plasma-enhanced chemical vapor deposition, and the material composition was varied by the reactive gas mixture in the process. We found a six-fold enhancement between the lowest and highest second-order susceptibility, with the h…
▽ More
We present a comprehensive tensorial characterization of second-harmonic generation from silicon nitride films with varying composition. The samples were fabricated using plasma-enhanced chemical vapor deposition, and the material composition was varied by the reactive gas mixture in the process. We found a six-fold enhancement between the lowest and highest second-order susceptibility, with the highest value of approximately 5 pm/V from the most silicon-rich sample. Moreover, the optical losses were found to be sufficiently small (below 6 dB/cm) for applications. The tensorial results show that all samples retain in-plane isotropy independent of silicon content, highlighting the controllability of the fabrication process.
△ Less
Submitted 3 October, 2017; v1 submitted 9 August, 2017;
originally announced August 2017.
-
In Defense of the Triplet Loss for Person Re-Identification
Authors:
Alexander Hermans,
Lucas Beyer,
Bastian Leibe
Abstract:
In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (clas…
▽ More
In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classification, verification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms most other published methods by a large margin.
△ Less
Submitted 21 November, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Superpixels: An Evaluation of the State-of-the-Art
Authors:
David Stutz,
Alexander Hermans,
Bastian Leibe
Abstract:
Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their q…
▽ More
Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algorithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying experimental setups hindered the development of a unifying benchmark. We present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair comparison and designed to provide new insights relevant for applications. To this end, we explicitly discuss parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extending well-known metrics, we are able to summarize algorithm performance independent of the number of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and affine transformations, implementation details as well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of our benchmark at davidstutz.de/projects/superpixel-benchmark/.
△ Less
Submitted 19 April, 2017; v1 submitted 5 December, 2016;
originally announced December 2016.
-
Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes
Authors:
Tobias Pohlen,
Alexander Hermans,
Markus Mathias,
Bastian Leibe
Abstract:
Semantic image segmentation is an essential component of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recogn…
▽ More
Semantic image segmentation is an essential component of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recognition performance (i.e., what is visible?), they lack localization accuracy (i.e., where precisely is something located?). Therefore, additional processing steps have to be performed in order to obtain pixel-accurate segmentation masks at the full image resolution. To alleviate this problem we propose a novel ResNet-like architecture that exhibits strong localization and recognition performance. We combine multi-scale context with pixel-level accuracy by using two processing streams within our network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The other stream undergoes a sequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. Without additional processing steps and without pre-training, our approach achieves an intersection-over-union score of 71.8% on the Cityscapes dataset.
△ Less
Submitted 6 December, 2016; v1 submitted 24 November, 2016;
originally announced November 2016.
-
The STRANDS Project: Long-Term Autonomy in Everyday Environments
Authors:
Nick Hawes,
Chris Burbridge,
Ferdian Jovan,
Lars Kunze,
Bruno Lacerda,
Lenka Mudrová,
Jay Young,
Jeremy Wyatt,
Denise Hebesberger,
Tobias Körtner,
Rares Ambrus,
Nils Bore,
John Folkesson,
Patric Jensfelt,
Lucas Beyer,
Alexander Hermans,
Bastian Leibe,
Aitor Aldoma,
Thomas Fäulhammer,
Michael Zillich,
Markus Vincze,
Eris Chinellato,
Muhannad Al-Omari,
Paul Duckworth,
Yiannis Gatsoulis
, et al. (8 additional authors not shown)
Abstract:
Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile…
▽ More
Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance.
△ Less
Submitted 14 October, 2016; v1 submitted 15 April, 2016;
originally announced April 2016.
-
DROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range Data
Authors:
Lucas Beyer,
Alexander Hermans,
Bastian Leibe
Abstract:
We introduce the DROW detector, a deep learning based detector for 2D range data. Laser scanners are lighting invariant, provide accurate range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performanc…
▽ More
We introduce the DROW detector, a deep learning based detector for 2D range data. Laser scanners are lighting invariant, provide accurate range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performance due to suboptimal design choices. We propose a Convolutional Neural Network (CNN) based detector for this task. We show how to effectively apply CNNs for detection in 2D range data, and propose a depth preprocessing step and voting scheme that significantly improve CNN performance. We demonstrate our approach on wheelchairs and walkers, obtaining state of the art detection results. Apart from the training data, none of our design choices limits the detector to these two classes, though. We provide a ROS node for our detector and release our dataset containing 464k laser scans, out of which 24k were annotated.
△ Less
Submitted 5 December, 2016; v1 submitted 8 March, 2016;
originally announced March 2016.
-
Atomic layer deposited second order nonlinear optical metamaterial for back-end integration with CMOS-compatible nanophotonic circuitry
Authors:
StÉphane Clemmen,
Artur Hermans,
Eduardo Solano,
Jolien Dendooven,
Kalle Koskinen,
Martti Kauranen,
Edouard Brainis,
Christophe Detavernier,
Roel Baets
Abstract:
We report the fabrication of artificial unidimensional crystals exhibiting an effective bulk second-order nonlinearity. The crystals are created by cycling atomic layer deposition of three dielectric materials such that the resulting metamaterial is non-centrosymmetric in the direction of the deposition. Characterization of the structures by second-harmonic generation Maker-fringe measurements sho…
▽ More
We report the fabrication of artificial unidimensional crystals exhibiting an effective bulk second-order nonlinearity. The crystals are created by cycling atomic layer deposition of three dielectric materials such that the resulting metamaterial is non-centrosymmetric in the direction of the deposition. Characterization of the structures by second-harmonic generation Maker-fringe measurements shows that the main component of their nonlinear susceptibility tensor is about 5 pm/V which is comparable to well-established materials and more than an order of magnitude greater than reported for a similar crystal [1-Alloatti et al, arXiv:1504.00101[cond-mat.mtrl- sci]]. Our demonstration opens new possibilities for second-order nonlinear effects on CMOS-compatible nanophotonic platforms.
△ Less
Submitted 21 August, 2015;
originally announced August 2015.