-
Uncertainty Quantification Metrics for Deep Regression
Authors:
Simon Kristoffersson Lind,
Ziliang Xiong,
Per-Erik Forssén,
Volker Krüger
Abstract:
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE),…
▽ More
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.
△ Less
Submitted 22 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression Tasks
Authors:
Ziliang Xiong,
Arvi Jonnarth,
Abdelrahman Eldesokey,
Joakim Johnander,
Bastian Wandt,
Per-Erik Forssen
Abstract:
Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption…
▽ More
Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein -- a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.
△ Less
Submitted 21 June, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
GMSF: Global Matching Scene Flow
Authors:
Yushan Zhang,
Johan Edstedt,
Bastian Wandt,
Per-Erik Forssén,
Maria Magnusson,
Michael Felsberg
Abstract:
We tackle the task of scene flow estimation from point clouds. Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target, resulting in a 3D motion vector field. Previous dominant scene flow estimation methods require complicated coarse-to-fine or recurrent architectures as a multi-stage refinement. In contrast, we pr…
▽ More
We tackle the task of scene flow estimation from point clouds. Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target, resulting in a 3D motion vector field. Previous dominant scene flow estimation methods require complicated coarse-to-fine or recurrent architectures as a multi-stage refinement. In contrast, we propose a significantly simpler single-scale one-shot global matching to address the problem. Our key finding is that reliable feature similarity between point pairs is essential and sufficient to estimate accurate scene flow. We thus propose to decompose the feature extraction step via a hybrid local-global-cross transformer architecture which is crucial to accurate and robust feature representations. Extensive experiments show that the proposed Global Matching Scene Flow (GMSF) sets a new state-of-the-art on multiple scene flow estimation benchmarks. On FlyingThings3D, with the presence of occlusion points, GMSF reduces the outlier percentage from the previous best performance of 27.4% to 5.6%. On KITTI Scene Flow, without any fine-tuning, our proposed method shows state-of-the-art performance. On the Waymo-Open dataset, the proposed method outperforms previous methods by a large margin. The code is available at https://github.com/ZhangYushan3/GMSF.
△ Less
Submitted 30 October, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Self-supervised learning of object pose estimation using keypoint prediction
Authors:
Zahra Gharaee,
Felix Järemo Lawin,
Per-Erik Forssén
Abstract:
This paper describes recent developments in object specific pose and shape prediction from single images. The main contribution is a new approach to camera pose prediction by self-supervised learning of keypoints corresponding to locations on a category specific deformable shape. We designed a network to generate a proxy ground-truth heatmap from a set of keypoints distributed all over the categor…
▽ More
This paper describes recent developments in object specific pose and shape prediction from single images. The main contribution is a new approach to camera pose prediction by self-supervised learning of keypoints corresponding to locations on a category specific deformable shape. We designed a network to generate a proxy ground-truth heatmap from a set of keypoints distributed all over the category-specific mean shape, where each is represented by a unique color on a labeled texture. The proxy ground-truth heatmap is used to train a deep keypoint prediction network, which can be used in online inference. The proposed approach to camera pose prediction show significant improvements when compared with state-of-the-art methods.
Our approach to camera pose prediction is used to infer 3D objects from 2D image frames of video sequences online. To train the reconstruction model, it receives only a silhouette mask from a single frame of a video sequence in every training step and a category-specific mean object shape. We conducted experiments using three different datasets representing the bird category: the CUB [51] image dataset, YouTubeVos and the Davis video datasets. The network is trained on the CUB dataset and tested on all three datasets. The online experiments are demonstrated on YouTubeVos and Davis [56] video sequences using a network trained on the CUB training set.
△ Less
Submitted 19 February, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Camera Calibration without Camera Access -- A Robust Validation Technique for Extended PnP Methods
Authors:
Emil Brissman,
Per-Erik Forssén,
Johan Edstedt
Abstract:
A challenge in image based metrology and forensics is intrinsic camera calibration when the used camera is unavailable. The unavailability raises two questions. The first question is how to find the projection model that describes the camera, and the second is to detect incorrect models. In this work, we use off-the-shelf extended PnP-methods to find the model from 2D-3D correspondences, and propo…
▽ More
A challenge in image based metrology and forensics is intrinsic camera calibration when the used camera is unavailable. The unavailability raises two questions. The first question is how to find the projection model that describes the camera, and the second is to detect incorrect models. In this work, we use off-the-shelf extended PnP-methods to find the model from 2D-3D correspondences, and propose a method for model validation. The most common strategy for evaluating a projection model is comparing different models' residual variances - however, this naive strategy cannot distinguish whether the projection model is potentially underfitted or overfitted. To this end, we model the residual errors for each correspondence, individually scale all residuals using a predicted variance and test if the new residuals are drawn from a standard normal distribution. We demonstrate the effectiveness of our proposed validation in experiments on synthetic data, simulating 2D detection and Lidar measurements. Additionally, we provide experiments using data from an actual scene and compare non-camera access and camera access calibrations. Last, we use our method to validate annotations in MegaDepth.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Registration Loss Learning for Deep Probabilistic Point Set Registration
Authors:
Felix Järemo Lawin,
Per-Erik Forssén
Abstract:
Probabilistic methods for point set registration have interesting theoretical properties, such as linear complexity in the number of used points, and they easily generalize to joint registration of multiple point sets. In this work, we improve their recognition performance to match state of the art. This is done by incorporating learned features, by adding a von Mises-Fisher feature model in each…
▽ More
Probabilistic methods for point set registration have interesting theoretical properties, such as linear complexity in the number of used points, and they easily generalize to joint registration of multiple point sets. In this work, we improve their recognition performance to match state of the art. This is done by incorporating learned features, by adding a von Mises-Fisher feature model in each mixture component, and by using learned attention weights. We learn these jointly using a registration loss learning strategy (RLL) that directly uses the registration error as a loss, by back-propagating through the registration iterations. This is possible as the probabilistic registration is fully differentiable, and the result is a learning framework that is truly end-to-end. We perform extensive experiments on the 3DMatch and Kitti datasets. The experiments demonstrate that our approach benefits significantly from the integration of the learned features and our learning strategy, outperforming the state-of-the-art on Kitti. Code is available at https://github.com/felja633/RLLReg.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Trajectory Representation and Landmark Projection for Continuous-Time Structure from Motion
Authors:
Hannes Ovrén,
Per-Erik Forssén
Abstract:
This paper revisits the problem of continuous-time structure from motion, and introduces a number of extensions that improve convergence and efficiency. The formulation with a $\mathcal{C}^2$-continuous spline for the trajectory naturally incorporates inertial measurements, as derivatives of the sought trajectory. We analyse the behaviour of split interpolation on $\mathbb{SO}(3)$ and on…
▽ More
This paper revisits the problem of continuous-time structure from motion, and introduces a number of extensions that improve convergence and efficiency. The formulation with a $\mathcal{C}^2$-continuous spline for the trajectory naturally incorporates inertial measurements, as derivatives of the sought trajectory. We analyse the behaviour of split interpolation on $\mathbb{SO}(3)$ and on $\mathbb{R}^3$, and a joint interpolation on $\mathbb{SE}(3)$, and show that the latter implicitly couples the direction of translation and rotation. Such an assumption can make good sense for a camera mounted on a robot arm, but not for hand-held or body-mounted cameras. Our experiments show that split interpolation on $\mathbb{SO}(3)$ and on $\mathbb{R}^3$ is preferable over $\mathbb{SE}(3)$ interpolation in all tested cases. Finally, we investigate the problem of landmark reprojection on rolling shutter cameras, and show that the tested reprojection methods give similar quality, while their computational load varies by a factor of 2.
△ Less
Submitted 7 May, 2018;
originally announced May 2018.
-
Spline Error Weighting for Robust Visual-Inertial Fusion
Authors:
Hannes Ovrén,
Per-Erik Forssén
Abstract:
In this paper we derive and test a probability-based weighting that can balance residuals of different types in spline fitting. In contrast to previous formulations, the proposed spline error weighting scheme also incorporates a prediction of the approximation error of the spline fit. We demonstrate the effectiveness of the prediction in a synthetic experiment, and apply it to visual-inertial fusi…
▽ More
In this paper we derive and test a probability-based weighting that can balance residuals of different types in spline fitting. In contrast to previous formulations, the proposed spline error weighting scheme also incorporates a prediction of the approximation error of the spline fit. We demonstrate the effectiveness of the prediction in a synthetic experiment, and apply it to visual-inertial fusion on rolling shutter cameras. This results in a method that can estimate 3D structure with metric scale on generic first-person videos. We also propose a quality measure for spline fitting, that can be used to automatically select the knot spacing. Experiments verify that the obtained trajectory quality corresponds well with the requested quality. Finally, by linearly scaling the weights, we show that the proposed spline error weighting minimizes the estimation errors on real sequences, in terms of scale and end-point errors.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Density Adaptive Point Set Registration
Authors:
Felix Järemo Lawin,
Martin Danelljan,
Fahad Shahbaz Khan,
Per-Erik Forssén,
Michael Felsberg
Abstract:
Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets. We re…
▽ More
Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the density of 3D points. This fundamental problem is primarily caused by changes in the sensor location across point sets. We revisit the foundations of the probabilistic registration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probability distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimizing the Kullback-Leibler divergence in an Expectation Maximization based framework. Our density-adaptive registration successfully handles severe density variations commonly encountered in terrestrial Lidar applications. We perform extensive experiments on several challenging real-world Lidar datasets. The results demonstrate that our approach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling. Code is available at https://github.com/felja633/DARE.
△ Less
Submitted 23 October, 2018; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Efficient Multi-Frequency Phase Unwrap** using Kernel Density Estimation
Authors:
Felix Järemo Lawin,
Per-Erik Forssén,
Hannes Ovrén
Abstract:
In this paper we introduce an efficient method to unwrap multi-frequency phase estimates for time-of-flight ranging. The algorithm generates multiple depth hypotheses and uses a spatial kernel density estimate (KDE) to rank them. The confidence produced by the KDE is also an effective means to detect outliers. We also introduce a new closed-form expression for phase noise prediction, that better f…
▽ More
In this paper we introduce an efficient method to unwrap multi-frequency phase estimates for time-of-flight ranging. The algorithm generates multiple depth hypotheses and uses a spatial kernel density estimate (KDE) to rank them. The confidence produced by the KDE is also an effective means to detect outliers. We also introduce a new closed-form expression for phase noise prediction, that better fits real data. The method is applied to depth decoding for the Kinect v2 sensor, and compared to the Microsoft Kinect SDK and to the open source driver libfreenect2. The intended Kinect v2 use case is scenes with less than 8m range, and for such cases we observe consistent improvements, while maintaining real-time performance. When extending the depth range to the maximal value of 8.75m, we get about 52% more valid measurements than libfreenect2. The effect is that the sensor can now be used in large depth scenes, where it was previously not a good choice. Code and supplementary material are available at http://www.cvl.isy.liu.se/research/datasets/kinect2-dataset.
△ Less
Submitted 18 August, 2016;
originally announced August 2016.