Search | arXiv e-print repository

DEFLOW: Self-supervised 3D Motion Estimation of Debris Flow

Authors: Liyuan Zhu, Yuru Jia, Shengyu Huang, Nicholas Meyer, Andreas Wieser, Konrad Schindler, Jordan Aaron

Abstract: Existing work on scene flow estimation focuses on autonomous driving and mobile robotics, while automated solutions are lacking for motion in nature, such as that exhibited by debris flows. We propose DEFLOW, a model for 3D motion estimation of debris flows, together with a newly captured dataset. We adopt a novel multi-level sensor fusion architecture and self-supervision to incorporate the induc… ▽ More Existing work on scene flow estimation focuses on autonomous driving and mobile robotics, while automated solutions are lacking for motion in nature, such as that exhibited by debris flows. We propose DEFLOW, a model for 3D motion estimation of debris flows, together with a newly captured dataset. We adopt a novel multi-level sensor fusion architecture and self-supervision to incorporate the inductive biases of the scene. We further adopt a multi-frame temporal processing module to enable flow speed estimation over time. Our model achieves state-of-the-art optical flow and depth estimation on our dataset, and fully automates the motion estimation for debris flows. The source code and dataset are available at project page. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Photogrammetric Computer Vision Workshop, CVPRW 2023, camera ready

arXiv:2207.12394 [pdf, other]

Dynamic 3D Scene Analysis by Point Cloud Accumulation

Authors: Shengyu Huang, Zan Gojcic, Jiahui Huang, Andreas Wieser, Konrad Schindler

Abstract: Multi-beam LiDAR sensors, as used on autonomous vehicles and mobile robots, acquire sequences of 3D range scans ("frames"). Each frame covers the scene sparsely, due to limited angular scanning resolution and occlusion. The sparsity restricts the performance of downstream processes like semantic segmentation or surface reconstruction. Luckily, when the sensor moves, frames are captured from a sequ… ▽ More Multi-beam LiDAR sensors, as used on autonomous vehicles and mobile robots, acquire sequences of 3D range scans ("frames"). Each frame covers the scene sparsely, due to limited angular scanning resolution and occlusion. The sparsity restricts the performance of downstream processes like semantic segmentation or surface reconstruction. Luckily, when the sensor moves, frames are captured from a sequence of different viewpoints. This provides complementary information and, when accumulated in a common scene coordinate frame, yields a denser sampling and a more complete coverage of the underlying 3D scene. However, often the scanned scenes contain moving objects. Points on those objects are not correctly aligned by just undoing the scanner's ego-motion. In the present paper, we explore multi-frame point cloud accumulation as a mid-level representation of 3D scan sequences, and develop a method that exploits inductive biases of outdoor street scenes, including their geometric layout and object-level rigidity. Compared to state-of-the-art scene flow estimators, our proposed approach aims to align all 3D points in a common reference frame correctly accumulating the points on the individual objects. Our approach greatly reduces the alignment errors on several benchmark datasets. Moreover, the accumulated point clouds benefit high-level tasks like surface reconstruction. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: ECCV 2022, camera ready

arXiv:2102.08945 [pdf, other]

Weakly Supervised Learning of Rigid 3D Scene Flow

Authors: Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J. Guibas, Tolga Birdal

Abstract: We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. At the core of our method lies a deep architecture able to reason at the \textbf{object-level} by considering 3D scene flow in conjunction with other 3D tasks. This object level abstraction, enables us to relax the requirement fo… ▽ More We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. At the core of our method lies a deep architecture able to reason at the \textbf{object-level} by considering 3D scene flow in conjunction with other 3D tasks. This object level abstraction, enables us to relax the requirement for dense scene flow supervision with simpler binary background segmentation mask and ego-motion annotations. Our mild supervision requirements make our method well suited for recently released massive data collections for autonomous driving, which do not contain dense scene flow annotations. As output, our model provides low-level cues like pointwise flow and higher-level cues such as holistic scene understanding at the level of rigid objects. We further propose a test-time optimization refining the predicted rigid scene flow. We showcase the effectiveness and generalization capacity of our method on four different autonomous driving datasets. We release our source code and pre-trained models under \url{github.com/zgojcic/Rigid3DSceneFlow}. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:2011.13005 [pdf, other]

PREDATOR: Registration of 3D Point Clouds with Low Overlap

Authors: Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, Konrad Schindler

Abstract: We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of… ▽ More We introduce PREDATOR, a model for pairwise point-cloud registration with deep attention to the overlap region. Different from previous work, our model is specifically designed to handle (also) point-cloud pairs with low overlap. Its key novelty is an overlap-attention block for early information exchange between the latent encodings of the two point clouds. In this way the subsequent decoding of the latent representations into per-point features is conditioned on the respective other point cloud, and thus can predict which points are not only salient, but also lie in the overlap region between the two point clouds. The ability to focus on points that are relevant for matching greatly improves performance: PREDATOR raises the rate of successful registrations by more than 20% in the low-overlap scenario, and also sets a new state of the art for the 3DMatch benchmark with 89% registration recall. △ Less

Submitted 6 August, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: CVPR 2021 (Oral) - Improved performance after fixing GNN bug

arXiv:1905.08022 [pdf, other]

An iterative scheme for feature based positioning using a weighted dissimilarity measure

Authors: Caifa Zhou, Andreas Wieser

Abstract: We propose an iterative scheme for feature-based positioning using a new weighted dissimilarity measure with the goal of reducing the impact of large errors among the measured or modeled features. The weights are computed from the location-dependent standard deviations of the features and stored as part of the reference fingerprint map (RFM). Spatial filtering and kernel smoothing of the kinematic… ▽ More We propose an iterative scheme for feature-based positioning using a new weighted dissimilarity measure with the goal of reducing the impact of large errors among the measured or modeled features. The weights are computed from the location-dependent standard deviations of the features and stored as part of the reference fingerprint map (RFM). Spatial filtering and kernel smoothing of the kinematically collected raw data allow efficiently estimating the standard deviations during RFM generation. In the positioning stage, the weights control the contribution of each feature to the dissimilarity measure, which in turn quantifies the difference between the set of online measured features and the fingerprints stored in the RFM. Features with little variability contribute more to the estimated position than features with high variability. Iterations are necessary because the variability depends on the location, and the location is initially unknown when estimating the position. Using real WiFi signal strength data from extended test measurements with ground truth in an office building, we show that the standard deviations of these features vary considerably within the region of interest and are neither simple functions of the signal strength nor of the distances from the corresponding access points. This is the motivation to include the empirical standard deviations in the RFM. We then analyze the deviations of the estimated positions with and without the location-dependent weighting. In the present example the maximum radial positioning error from ground truth are reduced by 40% comparing to kNN without the weighted dissimilarity measure. △ Less

Submitted 30 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: 18 pages, 9 figures, and 1 table

arXiv:1811.06879 [pdf, other]

The Perfect Match: 3D Point Cloud Matching with Smoothed Densities

Authors: Zan Gojcic, Caifa Zhou, Jan D. Wegner, Andreas Wieser

Abstract: We propose 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation. The latter is computed per interest point and aligned to the local reference frame (LRF) to achieve rotation invariance. Our compact, learned, rotation invariant 3D point cloud descriptor achieves 94… ▽ More We propose 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation. The latter is computed per interest point and aligned to the local reference frame (LRF) to achieve rotation invariance. Our compact, learned, rotation invariant 3D point cloud descriptor achieves 94.9% average recall on the 3DMatch benchmark data set, outperforming the state-of-the-art by more than 20 percent points with only 32 output dimensions. This very low output dimension allows for near realtime correspondence search with 0.1 ms per feature point on a standard PC. Our approach is sensor- and sceneagnostic because of SDV, LRF and learning highly descriptive features with fully convolutional layers. We show that 3DSmoothNet trained only on RGB-D indoor scenes of buildings achieves 79.0% average recall on laser scans of outdoor vegetation, more than double the performance of our closest, learning-based competitors. Code, data and pre-trained models are available online at https://github.com/zgojcic/3DSmoothNet. △ Less

Submitted 2 December, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

Comments: CVPR 2019

arXiv:1703.06933

Fast Radio Map Construction and Position Estimation via Direct Map** for WLAN Indoor Localization System

Authors: Caifa Zhou, Andreas Wieser, Xuezhi Tan

Abstract: The main limitation that constrains the fast and comprehensive application of Wireless Local Area Network (WLAN) based indoor localization systems with Received Signal Strength (RSS) positioning algorithms is the building of the fingerprinting radio map, which is time-consuming especially when the indoor environment is large and/or with high frequent changes. Different approaches have been propose… ▽ More The main limitation that constrains the fast and comprehensive application of Wireless Local Area Network (WLAN) based indoor localization systems with Received Signal Strength (RSS) positioning algorithms is the building of the fingerprinting radio map, which is time-consuming especially when the indoor environment is large and/or with high frequent changes. Different approaches have been proposed to reduce workload, including fingerprinting deployment and update efforts, but the performance degrades greatly when the workload is reduced below a certain level. In this paper, we propose an indoor localization scenario that applies metric learning and manifold alignment to realize direct map** localization (DML) using a low resolution radio map with single sample of RSS that reduces the fingerprinting workload by up to 87\%. Compared to previous work. The proposed two localization approaches, DML and $k$ nearest neighbors based on reconstructed radio map (reKNN), were shown to achieve less than 4.3\ m and 3.7\ m mean localization error respectively in a typical office environment with an area of approximately 170\ m$^2$, while the unsupervised localization with perturbation algorithm was shown to achieve 4.7\ m mean localization error with 8 times more workload than the proposed methods. As for the room level localization application, both DML and reKNN can meet the requirement with at most 9\ m of localization error which is enough to tell apart different rooms with over 99\% accuracy. △ Less

Submitted 3 April, 2017; v1 submitted 14 March, 2017; originally announced March 2017.

Comments: more refined analysis required

arXiv:1703.06912 [pdf, other]

Application of backpropagation neural networks to both stages of fingerprinting based WIPS

Authors: Caifa Zhou, Andreas Wieser

Abstract: We propose a scheme to employ backpropagation neural networks (BPNNs) for both stages of fingerprinting-based indoor positioning using WLAN/WiFi signal strengths (FWIPS): radio map construction during the offline stage, and localization during the online stage. Given a training radio map (TRM), i.e., a set of coordinate vectors and associated WLAN/WiFi signal strengths of the available access poin… ▽ More We propose a scheme to employ backpropagation neural networks (BPNNs) for both stages of fingerprinting-based indoor positioning using WLAN/WiFi signal strengths (FWIPS): radio map construction during the offline stage, and localization during the online stage. Given a training radio map (TRM), i.e., a set of coordinate vectors and associated WLAN/WiFi signal strengths of the available access points, a BPNN can be trained to output the expected signal strengths for any input position within the region of interest (BPNN-RM). This can be used to provide a continuous representation of the radio map and to filter, densify or decimate a discrete radio map. Correspondingly, the TRM can also be used to train another BPNN to output the expected position within the region of interest for any input vector of recorded signal strengths and thus carry out localization (BPNN-LA).Key aspects of the design of such artificial neural networks for a specific application are the selection of design parameters like the number of hidden layers and nodes within the network, and the training procedure. Summarizing extensive numerical simulations, based on real measurements in a testbed, we analyze the impact of these design choices on the performance of the BPNN and compare the results in particular to those obtained using the $k$ nearest neighbors ($k$NN) and weighted $k$ nearest neighbors approaches to FWIPS. △ Less

Submitted 14 March, 2017; originally announced March 2017.

Comments: 11 pages, 11 figures, published in proceedings UPINLBS 2016

Showing 1–8 of 8 results for author: Wieser, A