-
Rotation-Equivariant Deep Learning for Diffusion MRI
Authors:
Philip Müller,
Vladimir Golkov,
Valentina Tomassini,
Daniel Cremers
Abstract:
Convolutional networks are successful, but they have recently been outperformed by new neural networks that are equivariant under rotations and translations. These new networks work better because they do not struggle with learning each possible orientation of each image feature separately. So far, they have been proposed for 2D and 3D data. Here we generalize them to 6D diffusion MRI data, ensuri…
▽ More
Convolutional networks are successful, but they have recently been outperformed by new neural networks that are equivariant under rotations and translations. These new networks work better because they do not struggle with learning each possible orientation of each image feature separately. So far, they have been proposed for 2D and 3D data. Here we generalize them to 6D diffusion MRI data, ensuring joint equivariance under 3D roto-translations in image space and the matching 3D rotations in $q$-space, as dictated by the image formation. Such equivariant deep learning is appropriate for diffusion MRI, because microstructural and macrostructural features such as neural fibers can appear at many different orientations, and because even non-rotation-equivariant deep learning has so far been the best method for many diffusion MRI tasks. We validate our equivariant method on multiple-sclerosis lesion segmentation. Our proposed neural networks yield better results and require fewer scans for training compared to non-rotation-equivariant deep learning. They also inherit all the advantages of deep learning over classical diffusion MRI methods. Our implementation is available at https://github.com/philip-mueller/equivariant-deep-dmri and can be used off the shelf without understanding the mathematical background.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Tight Integration of Feature-based Relocalization in Monocular Direct Visual Odometry
Authors:
Mariia Gladkova,
Rui Wang,
Niclas Zeller,
Daniel Cremers
Abstract:
In this paper we propose a framework for integrating map-based relocalization into online direct visual odometry. To achieve map-based relocalization for direct methods, we integrate image features into Direct Sparse Odometry (DSO) and rely on feature matching to associate online visual odometry (VO) with a previously built map. The integration of the relocalization poses is threefold. Firstly, th…
▽ More
In this paper we propose a framework for integrating map-based relocalization into online direct visual odometry. To achieve map-based relocalization for direct methods, we integrate image features into Direct Sparse Odometry (DSO) and rely on feature matching to associate online visual odometry (VO) with a previously built map. The integration of the relocalization poses is threefold. Firstly, they are incorporated as pose priors in the direct image alignment of the front-end tracking. Secondly, they are tightly integrated into the back-end bundle adjustment. Thirdly, an online fusion module is further proposed to combine relative VO poses and global relocalization poses in a pose graph to estimate keyframe-wise smooth and globally accurate poses. We evaluate our method on two multi-weather datasets showing the benefits of integrating different handcrafted and learned features and demonstrating promising improvements on camera tracking accuracy.
△ Less
Submitted 29 March, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Post-hoc Uncertainty Calibration for Domain Drift Scenarios
Authors:
Christian Tomani,
Sebastian Gruber,
Muhammed Ebrar Erdem,
Daniel Cremers,
Florian Buettner
Abstract:
We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show…
▽ More
We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show that existing post-hoc calibration methods yield highly over-confident predictions under domain shift. Second, we introduce a simple strategy where perturbations are applied to samples in the validation set before performing the post-hoc calibration step. In extensive experiments, we demonstrate that this perturbation step results in substantially better calibration under domain shift on a wide range of architectures and modelling tasks.
△ Less
Submitted 23 June, 2021; v1 submitted 20 December, 2020;
originally announced December 2020.
-
Neural Online Graph Exploration
Authors:
Ioannis Chiotellis,
Daniel Cremers
Abstract:
Can we learn how to explore unknown spaces efficiently? To answer this question, we study the problem of Online Graph Exploration, the online version of the Traveling Salesperson Problem. We reformulate graph exploration as a reinforcement learning problem and apply Direct Future Prediction (Dosovitskiy and Koltun, 2017) to solve it. As the graph is discovered online, the corresponding Markov Deci…
▽ More
Can we learn how to explore unknown spaces efficiently? To answer this question, we study the problem of Online Graph Exploration, the online version of the Traveling Salesperson Problem. We reformulate graph exploration as a reinforcement learning problem and apply Direct Future Prediction (Dosovitskiy and Koltun, 2017) to solve it. As the graph is discovered online, the corresponding Markov Decision Process entails a dynamic state space, namely the observable graph and a dynamic action space, namely the nodes forming the graph's frontier. To the best of our knowledge, this is the first attempt to solve online graph exploration in a data-driven way. We conduct experiments on six data sets of procedurally generated graphs and three real city road networks. We demonstrate that our agent can learn strategies superior to many well known graph traversal algorithms, confirming that exploration can be learned.
△ Less
Submitted 6 April, 2021; v1 submitted 6 December, 2020;
originally announced December 2020.
-
Isometric Multi-Shape Matching
Authors:
Maolin Gao,
Zorah Lähner,
Johan Thunberg,
Daniel Cremers,
Florian Bernard
Abstract:
Finding correspondences between shapes is a fundamental problem in computer vision and graphics, which is relevant for many applications, including 3D reconstruction, object tracking, and style transfer. The vast majority of correspondence methods aim to find a solution between pairs of shapes, even if multiple instances of the same class are available. While isometries are often studied in shape…
▽ More
Finding correspondences between shapes is a fundamental problem in computer vision and graphics, which is relevant for many applications, including 3D reconstruction, object tracking, and style transfer. The vast majority of correspondence methods aim to find a solution between pairs of shapes, even if multiple instances of the same class are available. While isometries are often studied in shape correspondence problems, they have not been considered explicitly in the multi-matching setting. This paper closes this gap by proposing a novel optimisation formulation for isometric multi-shape matching. We present a suitable optimisation algorithm for solving our formulation and provide a convergence and complexity analysis. Our algorithm obtains multi-matchings that are by construction provably cycle-consistent. We demonstrate the superior performance of our method on various datasets and set the new state-of-the-art in isometric multi-shape matching.
△ Less
Submitted 3 April, 2024; v1 submitted 4 December, 2020;
originally announced December 2020.
-
i3DMM: Deep Implicit 3D Morphable Model of Human Heads
Authors:
Tarun Yenamandra,
Ayush Tewari,
Florian Bernard,
Hans-Peter Seidel,
Mohamed Elgharib,
Daniel Cremers,
Christian Theobalt
Abstract:
We present the first deep implicit 3D morphable model (i3DMM) of full heads. Unlike earlier morphable face models it not only captures identity-specific geometry, texture, and expressions of the frontal face, but also models the entire head, including hair. We collect a new dataset consisting of 64 people with different expressions and hairstyles to train i3DMM. Our approach has the following favo…
▽ More
We present the first deep implicit 3D morphable model (i3DMM) of full heads. Unlike earlier morphable face models it not only captures identity-specific geometry, texture, and expressions of the frontal face, but also models the entire head, including hair. We collect a new dataset consisting of 64 people with different expressions and hairstyles to train i3DMM. Our approach has the following favorable properties: (i) It is the first full head morphable model that includes hair. (ii) In contrast to mesh-based models it can be trained on merely rigidly aligned scans, without requiring difficult non-rigid registration. (iii) We design a novel architecture to decouple the shape model into an implicit reference shape and a deformation of this reference shape. With that, dense correspondences between shapes can be learned implicitly. (iv) This architecture allows us to semantically disentangle the geometry and color components, as color is learned in the reference space. Geometry is further disentangled as identity, expressions, and hairstyle, while color is disentangled as identity and hairstyle components. We show the merits of i3DMM using ablation studies, comparisons to state-of-the-art models, and applications such as semantic head editing and texture transfer. We will make our model publicly available.
△ Less
Submitted 28 November, 2020;
originally announced November 2020.
-
Non-Rigid Puzzles
Authors:
Or Litany,
Emanuele Rodolà,
Alex Bronstein,
Michael Bronstein,
Daniel Cremers
Abstract:
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture map**, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigi…
▽ More
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture map**, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigid multi-part shape matching algorithm. We assume to be given a reference shape and its multiple parts undergoing a non-rigid deformation. Each of these query parts can be additionally contaminated by clutter, may overlap with other parts, and there might be missing parts or redundant ones. Our method simultaneously solves for the segmentation of the reference model, and for a dense correspondence to (subsets of) the parts. Experimental results on synthetic as well as real scans demonstrate the effectiveness of our method in dealing with this challenging matching scenario.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition
Authors:
Yan Xia,
Yusheng Xu,
Shuang Li,
Rui Wang,
Juan Du,
Daniel Cremers,
Uwe Stilla
Abstract:
We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors. Local information of each point from eight orientations is captured in a PointOE module, whereas long-range feature dependencies among loc…
▽ More
We tackle the problem of place recognition from point cloud data and introduce a self-attention and orientation encoding network (SOE-Net) that fully explores the relationship between points and incorporates long-range context into point-wise local descriptors. Local information of each point from eight orientations is captured in a PointOE module, whereas long-range feature dependencies among local descriptors are captured with a self-attention unit. Moreover, we propose a novel loss function called Hard Positive Hard Negative quadruplet loss (HPHN quadruplet), that achieves better performance than the commonly used metric learning loss. Experiments on various benchmark datasets demonstrate superior performance of the proposed network over the current state-of-the-art approaches. Our code is released publicly at https://github.com/Yan-Xia/SOE-Net.
△ Less
Submitted 23 May, 2021; v1 submitted 24 November, 2020;
originally announced November 2020.
-
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera
Authors:
Felix Wimbauer,
Nan Yang,
Lukas von Stumberg,
Niclas Zeller,
Daniel Cremers
Abstract:
In this paper, we propose MonoRec, a semi-supervised monocular dense reconstruction architecture that predicts depth maps from a single moving camera in dynamic environments. MonoRec is based on a multi-view stereo setting which encodes the information of multiple consecutive images in a cost volume. To deal with dynamic objects in the scene, we introduce a MaskModule that predicts moving object m…
▽ More
In this paper, we propose MonoRec, a semi-supervised monocular dense reconstruction architecture that predicts depth maps from a single moving camera in dynamic environments. MonoRec is based on a multi-view stereo setting which encodes the information of multiple consecutive images in a cost volume. To deal with dynamic objects in the scene, we introduce a MaskModule that predicts moving object masks by leveraging the photometric inconsistencies encoded in the cost volumes. Unlike other multi-view stereo methods, MonoRec is able to reconstruct both static and moving objects by leveraging the predicted masks. Furthermore, we present a novel multi-stage training scheme with a semi-supervised loss formulation that does not require LiDAR depth values. We carefully evaluate MonoRec on the KITTI dataset and show that it achieves state-of-the-art performance compared to both multi-view and single-view methods. With the model trained on KITTI, we further demonstrate that MonoRec is able to generalize well to both the Oxford RobotCar dataset and the more challenging TUM-Mono dataset recorded by a handheld camera. Code and related materials will be available at https://vision.in.tum.de/research/monorec.
△ Less
Submitted 6 May, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Deep Shells: Unsupervised Shape Correspondence with Optimal Transport
Authors:
Marvin Eisenberger,
Aysim Toker,
Laura Leal-Taixé,
Daniel Cremers
Abstract:
We propose a novel unsupervised learning approach to 3D shape correspondence that builds a multiscale matching pipeline into a deep neural network. This approach is based on smooth shells, the current state-of-the-art axiomatic correspondence method, which requires an a priori stochastic search over the space of initial poses. Our goal is to replace this costly preprocessing step by directly learn…
▽ More
We propose a novel unsupervised learning approach to 3D shape correspondence that builds a multiscale matching pipeline into a deep neural network. This approach is based on smooth shells, the current state-of-the-art axiomatic correspondence method, which requires an a priori stochastic search over the space of initial poses. Our goal is to replace this costly preprocessing step by directly learning good initializations from the input surfaces. To that end, we systematically derive a fully differentiable, hierarchical matching pipeline from entropy regularized optimal transport. This allows us to combine it with a local feature extractor based on smooth, truncated spectral convolution filters. Finally, we show that the proposed unsupervised method significantly improves over the state-of-the-art on multiple datasets, even in comparison to the most recent supervised methods. Moreover, we demonstrate compelling generalization results by applying our learned filters to examples that significantly deviate from the training set.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Speech Synthesis and Control Using Differentiable DSP
Authors:
Giorgio Fabbro,
Vladimir Golkov,
Thomas Kemp,
Daniel Cremers
Abstract:
Modern text-to-speech systems are able to produce natural and high-quality speech, but speech contains factors of variation (e.g. pitch, rhythm, loudness, timbre)\ that text alone cannot contain. In this work we move towards a speech synthesis system that can produce diverse speech renditions of a text by allowing (but not requiring) explicit control over the various factors of variation. We propo…
▽ More
Modern text-to-speech systems are able to produce natural and high-quality speech, but speech contains factors of variation (e.g. pitch, rhythm, loudness, timbre)\ that text alone cannot contain. In this work we move towards a speech synthesis system that can produce diverse speech renditions of a text by allowing (but not requiring) explicit control over the various factors of variation. We propose a new neural vocoder that offers control of such factors of variation. This is achieved by employing differentiable digital signal processing (DDSP) (previously used only for music rather than speech), which exposes these factors of variation. The results show that the proposed approach can produce natural speech with realistic timbre, and individual factors of variation can be freely controlled.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Unsupervised Dense Shape Correspondence using Heat Kernels
Authors:
Mehmet Aygün,
Zorah Lähner,
Daniel Cremers
Abstract:
In this work, we propose an unsupervised method for learning dense correspondences between shapes using a recent deep functional map framework. Instead of depending on ground-truth correspondences or the computationally expensive geodesic distances, we use heat kernels. These can be computed quickly during training as the supervisor signal. Moreover, we propose a curriculum learning strategy using…
▽ More
In this work, we propose an unsupervised method for learning dense correspondences between shapes using a recent deep functional map framework. Instead of depending on ground-truth correspondences or the computationally expensive geodesic distances, we use heat kernels. These can be computed quickly during training as the supervisor signal. Moreover, we propose a curriculum learning strategy using different heat diffusion times which provide different levels of difficulty during optimization without any sampling mechanism or hard example mining. We present the results of our method on different benchmarks which have various challenges like partiality, topological noise and different connectivity.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking
Authors:
Patrick Dendorfer,
Aljoša Ošep,
Anton Milan,
Konrad Schindler,
Daniel Cremers,
Ian Reid,
Stefan Roth,
Laura Leal-Taixé
Abstract:
Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched…
▽ More
Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data, and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light on potential future research directions.
△ Less
Submitted 8 December, 2020; v1 submitted 15 October, 2020;
originally announced October 2020.
-
LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization
Authors:
Lukas von Stumberg,
Patrick Wenzel,
Nan Yang,
Daniel Cremers
Abstract:
We present LM-Reloc -- a novel approach for visual relocalization based on direct image alignment. In contrast to prior works that tackle the problem with a feature-based formulation, the proposed method does not rely on feature matching and RANSAC. Hence, the method can utilize not only corners but any region of the image with gradients. In particular, we propose a loss formulation inspired by th…
▽ More
We present LM-Reloc -- a novel approach for visual relocalization based on direct image alignment. In contrast to prior works that tackle the problem with a feature-based formulation, the proposed method does not rely on feature matching and RANSAC. Hence, the method can utilize not only corners but any region of the image with gradients. In particular, we propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net. The learned features significantly improve the robustness of direct image alignment, especially for relocalization across different conditions. To further improve the robustness of LM-Net against large image baselines, we propose a pose estimation network, CorrPoseNet, which regresses the relative pose to bootstrap the direct image alignment. Evaluations on the CARLA and Oxford RobotCar relocalization tracking benchmark show that our approach delivers more accurate results than previous state-of-the-art methods while being comparable in terms of robustness.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels
Authors:
L. Koestler,
N. Yang,
R. Wang,
D. Cremers
Abstract:
The training of deep-learning-based 3D object detectors requires large datasets with 3D bounding box labels for supervision that have to be generated by hand-labeling. We propose a network architecture and training procedure for learning monocular 3D object detection without 3D bounding box labels. By representing the objects as triangular meshes and employing differentiable shape rendering, we de…
▽ More
The training of deep-learning-based 3D object detectors requires large datasets with 3D bounding box labels for supervision that have to be generated by hand-labeling. We propose a network architecture and training procedure for learning monocular 3D object detection without 3D bounding box labels. By representing the objects as triangular meshes and employing differentiable shape rendering, we define loss functions based on depth maps, segmentation masks, and ego- and object-motion, which are generated by pre-trained, off-the-shelf networks. We evaluate the proposed algorithm on the real-world KITTI dataset and achieve promising performance in comparison to state-of-the-art methods requiring 3D bounding box labels for training and superior performance to conventional baseline methods.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
4Seasons: A Cross-Season Dataset for Multi-Weather SLAM in Autonomous Driving
Authors:
Patrick Wenzel,
Rui Wang,
Nan Yang,
Qing Cheng,
Qadeer Khan,
Lukas von Stumberg,
Niclas Zeller,
Daniel Cremers
Abstract:
We present a novel dataset covering seasonal and challenging perceptual conditions for autonomous driving. Among others, it enables research on visual odometry, global place recognition, and map-based re-localization tracking. The data was collected in different scenarios and under a wide variety of weather conditions and illuminations, including day and night. This resulted in more than 350 km of…
▽ More
We present a novel dataset covering seasonal and challenging perceptual conditions for autonomous driving. Among others, it enables research on visual odometry, global place recognition, and map-based re-localization tracking. The data was collected in different scenarios and under a wide variety of weather conditions and illuminations, including day and night. This resulted in more than 350 km of recordings in nine different environments ranging from multi-level parking garage over urban (including tunnels) to countryside and highway. We provide globally consistent reference poses with up-to centimeter accuracy obtained from the fusion of direct stereo visual-inertial odometry with RTK-GNSS. The full dataset is available at https://www.4seasons-dataset.com.
△ Less
Submitted 14 October, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization
Authors:
Juan Du,
Rui Wang,
Daniel Cremers
Abstract:
For relocalization in large-scale point clouds, we propose the first approach that unifies global place recognition and local 6DoF pose refinement. To this end, we design a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points. It integrates FlexConv and Squeeze-and-Excitation (SE) to assure that the learned local descriptor captures multi-level…
▽ More
For relocalization in large-scale point clouds, we propose the first approach that unifies global place recognition and local 6DoF pose refinement. To this end, we design a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points. It integrates FlexConv and Squeeze-and-Excitation (SE) to assure that the learned local descriptor captures multi-level geometric information and channel-wise relations. For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner. We generate the global descriptor by directly aggregating the learned local descriptors with an effective attention mechanism. In this way, local and global 3D descriptors are inferred in one single forward pass. Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration in comparison to state-of-the-art approaches. To validate the generalizability and robustness of our 3D keypoints, we demonstrate that our method also performs favorably without fine-tuning on the registration of point clouds that were generated by a visual SLAM system. Code and related materials are available at https://vision.in.tum.de/research/vslam/dh3d.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions
Authors:
Vladimir Golkov,
Alexander Becker,
Daniel T. Plop,
Daniel Čuturilo,
Neda Davoudi,
Jeffrey Mendenhall,
Rocco Moretti,
Jens Meiler,
Daniel Cremers
Abstract:
Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of g…
▽ More
Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets. In this work we argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance, its ability to compromise over different decision thresholds, certain freedom to influence the relative weights in this compromise, fidelity to typical benchmarking measures, and equivalence to positive/unlabeled learning. We also propose new training schemes (coherent mini-batch arrangement, and usage of out-of-batch samples) for cost functions based on the ROC, as well as a cost function based on the logAUC metric that facilitates early enrichment (i.e. improves performance at high decision thresholds, as often desired when synthesizing predicted hit compounds). We demonstrate that these approaches outperform standard deep learning approaches on a series of PubChem high-throughput screening datasets that represent realistic and diverse drug discovery campaigns on major drug target families.
△ Less
Submitted 25 June, 2020;
originally announced July 2020.
-
A Chain Graph Interpretation of Real-World Neural Networks
Authors:
Yuesong Shen,
Daniel Cremers
Abstract:
The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains. However, most advances have been established empirically, and their theoretical analysis remains lacking. One major issue is that our current interpretation of neural networks (NNs) as function approximators is too generic to support in-depth analysis. In this pape…
▽ More
The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains. However, most advances have been established empirically, and their theoretical analysis remains lacking. One major issue is that our current interpretation of neural networks (NNs) as function approximators is too generic to support in-depth analysis. In this paper, we remedy this by proposing an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models, while at the same time remains general enough to cover real-world NNs with arbitrary depth, multi-branching and varied activations, as well as common structures including convolution / recurrent layers, residual block and dropout. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques, as well as derive new deep learning approaches such as the concept of partially collapsed feed-forward inference. It is thus a promising framework that deepens our understanding of neural networks and provides a coherent theoretical formulation for future deep learning research.
△ Less
Submitted 6 October, 2020; v1 submitted 30 June, 2020;
originally announced June 2020.
-
Effective Version Space Reduction for Convolutional Neural Networks
Authors:
Jiayu Liu,
Ioannis Chiotellis,
Rudolph Triebel,
Daniel Cremers
Abstract:
In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two…
▽ More
In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two approaches---prior mass reduction and diameter reduction---and propose a new diameter-based querying method---the minimum Gibbs-vote disagreement. By estimating version space diameter and bias, we illustrate how version space of neural networks evolves and examine the realizability assumption. With experiments on MNIST, Fashion-MNIST, SVHN and STL-10 datasets, we demonstrate that diameter reduction methods reduce the version space more effectively and perform better than prior mass reduction and other baselines, and that the Gibbs vote disagreement is on par with the best query method.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
PrimiTect: Fast Continuous Hough Voting for Primitive Detection
Authors:
Christiane Sommer,
Yumin Sun,
Erik Bylow,
Daniel Cremers
Abstract:
This paper tackles the problem of data abstraction in the context of 3D point sets. Our method classifies points into different geometric primitives, such as planes and cones, leading to a compact representation of the data. Being based on a semi-global Hough voting scheme, the method does not need initialization and is robust, accurate, and efficient. We use a local, low-dimensional parameterizat…
▽ More
This paper tackles the problem of data abstraction in the context of 3D point sets. Our method classifies points into different geometric primitives, such as planes and cones, leading to a compact representation of the data. Being based on a semi-global Hough voting scheme, the method does not need initialization and is robust, accurate, and efficient. We use a local, low-dimensional parameterization of primitives to determine type, shape and pose of the object that a point belongs to. This makes our algorithm suitable to run on devices with low computational power, as often required in robotics applications. The evaluation shows that our method outperforms state-of-the-art methods both in terms of accuracy and robustness.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Hamiltonian Dynamics for Real-World Shape Interpolation
Authors:
Marvin Eisenberger,
Daniel Cremers
Abstract:
We revisit the classical problem of 3D shape interpolation and propose a novel, physically plausible approach based on Hamiltonian dynamics. While most prior work focuses on synthetic input shapes, our formulation is designed to be applicable to real-world scans with imperfect input correspondences and various types of noise. To that end, we use recent progress on dynamic thin shell simulation and…
▽ More
We revisit the classical problem of 3D shape interpolation and propose a novel, physically plausible approach based on Hamiltonian dynamics. While most prior work focuses on synthetic input shapes, our formulation is designed to be applicable to real-world scans with imperfect input correspondences and various types of noise. To that end, we use recent progress on dynamic thin shell simulation and divergence-free shape deformation and combine them to address the inverse problem of finding a plausible intermediate sequence for two input shapes. In comparison to prior work that mainly focuses on small distortion of consecutive frames, we explicitly model volume preservation and momentum conservation, as well as an anisotropic local distortion model. We argue that, in order to get a robust interpolation for imperfect inputs, we need to model the input noise explicitly which results in an alignment based formulation. Finally, we show a qualitative and quantitative improvement over prior work on a broad range of synthetic and scanned data. Besides being more robust to noisy inputs, our method yields exactly volume preserving intermediate shapes, avoids self-intersections and is scalable to high resolution scans.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.
-
MOT20: A benchmark for multi object tracking in crowded scenes
Authors:
Patrick Dendorfer,
Hamid Rezatofighi,
Anton Milan,
Javen Shi,
Daniel Cremers,
Ian Reid,
Stefan Roth,
Konrad Schindler,
Laura Leal-Taixé
Abstract:
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of mu…
▽ More
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods. The challenge focuses on multiple people tracking, since pedestrians are well studied in the tracking community, and precise tracking and detection has high practical relevance. Since the first release, MOT15, MOT16, and MOT17 have tremendously contributed to the community by introducing a clean dataset and precise framework to benchmark multi-object trackers. In this paper, we present our MOT20benchmark, consisting of 8 new sequences depicting very crowded challenging scenes. The benchmark was presented first at the 4thBMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2019, and gives to chance to evaluate state-of-the-art methods for multiple object tracking when handling extremely crowded scenarios.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry
Authors:
Nan Yang,
Lukas von Stumberg,
Rui Wang,
Daniel Cremers
Abstract:
We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation. We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision. In particular, it aligns the training image pairs into similar lighting condition with predictive brightne…
▽ More
We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation. We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision. In particular, it aligns the training image pairs into similar lighting condition with predictive brightness transformation parameters. Besides, we model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy and provides a learned weighting function for the photometric residuals in direct (feature-less) visual odometry. Evaluation results show that the proposed network outperforms state-of-the-art self-supervised depth estimation networks. D3VO tightly incorporates the predicted depth, pose and uncertainty into a direct visual odometry method to boost both the front-end tracking as well as the back-end non-linear optimization. We evaluate D3VO in terms of monocular visual odometry on both the KITTI odometry benchmark and the EuRoC MAV dataset.The results show that D3VO outperforms state-of-the-art traditional monocular VO methods by a large margin. It also achieves comparable results to state-of-the-art stereo/LiDAR odometry on KITTI and to the state-of-the-art visual-inertial odometry on EuRoC MAV, while using only a single camera.
△ Less
Submitted 28 March, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Optimization of Graph Total Variation via Active-Set-based Combinatorial Reconditioning
Authors:
Zhenzhang Ye,
Thomas Möllenhoff,
Tao Wu,
Daniel Cremers
Abstract:
Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision. In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. Our preconditioner is driven by a sharp analysis of the local linear convergence rate depending on the "active set" at the current iterate. We show that nested-forest…
▽ More
Structured convex optimization on weighted graphs finds numerous applications in machine learning and computer vision. In this work, we propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class. Our preconditioner is driven by a sharp analysis of the local linear convergence rate depending on the "active set" at the current iterate. We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate. Further, we propose a practical greedy heuristic which realizes such nested decompositions and show in several numerical experiments that our reconditioning strategy, when applied to proximal gradient or primal-dual hybrid gradient algorithm, achieves competitive performances. Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Learn to Predict Sets Using Feed-Forward Neural Networks
Authors:
Hamid Rezatofighi,
Tianyu Zhu,
Roman Kaskman,
Farbod T. Motlagh,
Qinfeng Shi,
Anton Milan,
Daniel Cremers,
Laura Leal-Taixé,
Ian Reid
Abstract:
This paper addresses the task of set prediction using deep feed-forward neural networks. A set is a collection of elements which is invariant under permutation and the size of a set is not fixed in advance. Many real-world problems, such as image tagging and object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural network…
▽ More
This paper addresses the task of set prediction using deep feed-forward neural networks. A set is a collection of elements which is invariant under permutation and the size of a set is not fixed in advance. Many real-world problems, such as image tagging and object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural networks which naturally deal with structured outputs such as vectors, matrices or tensors. We present a novel approach for learning to predict sets with unknown permutation and cardinality using deep neural networks. In our formulation we define a likelihood for a set distribution represented by a) two discrete distributions defining the set cardinally and permutation variables, and b) a joint distribution over set elements with a fixed cardinality. Depending on the problem under consideration, we define different training models for set prediction using deep neural networks. We demonstrate the validity of our set formulations on relevant vision problems such as: 1) multi-label image classification where we outperform the other competing methods on the PASCAL VOC and MS COCO datasets, 2) object detection, for which our formulation outperforms popular state-of-the-art detectors, and 3) a complex CAPTCHA test, where we observe that, surprisingly, our set-based network acquired the ability of mimicking arithmetics without any rules being coded.
△ Less
Submitted 25 October, 2021; v1 submitted 29 January, 2020;
originally announced January 2020.
-
From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds
Authors:
Christiane Sommer,
Yumin Sun,
Leonidas Guibas,
Daniel Cremers,
Tolga Birdal
Abstract:
We propose a new method for segmentation-free joint estimation of orthogonal planes, their intersection lines, relationship graph and corners lying at the intersection of three orthogonal planes. Such unified scene exploration under orthogonality allows for multitudes of applications such as semantic plane detection or local and global scan alignment, which in turn can aid robot localization or gr…
▽ More
We propose a new method for segmentation-free joint estimation of orthogonal planes, their intersection lines, relationship graph and corners lying at the intersection of three orthogonal planes. Such unified scene exploration under orthogonality allows for multitudes of applications such as semantic plane detection or local and global scan alignment, which in turn can aid robot localization or gras** tasks. Our two-stage pipeline involves a rough yet joint estimation of orthogonal planes followed by a subsequent joint refinement of plane parameters respecting their orthogonality relations. We form a graph of these primitives, paving the way to the extraction of further reliable features: lines and corners. Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking, both on synthetic and real data.
△ Less
Submitted 24 April, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Inferring Super-Resolution Depth from a Moving Light-Source Enhanced RGB-D Sensor: A Variational Approach
Authors:
Lu Sang,
Bjoern Haefner,
Daniel Cremers
Abstract:
A novel approach towards depth map super-resolution using multi-view uncalibrated photometric stereo is presented. Practically, an LED light source is attached to a commodity RGB-D sensor and is used to capture objects from multiple viewpoints with unknown motion. This non-static camera-to-object setup is described with a nonconvex variational approach such that no calibration on lighting or camer…
▽ More
A novel approach towards depth map super-resolution using multi-view uncalibrated photometric stereo is presented. Practically, an LED light source is attached to a commodity RGB-D sensor and is used to capture objects from multiple viewpoints with unknown motion. This non-static camera-to-object setup is described with a nonconvex variational approach such that no calibration on lighting or camera motion is required due to the formulation of an end-to-end joint optimization problem. Solving the proposed variational model results in high resolution depth, reflectance and camera pose estimates, as we show on challenging synthetic and real-world datasets.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Informative GANs via Structured Regularization of Optimal Transport
Authors:
Pierre Bréchet,
Tao Wu,
Thomas Möllenhoff,
Daniel Cremers
Abstract:
We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT). Specifically, a smoothed OT loss gives rise to an implicit transportation plan between the latent space and the data space. Based on this theoretical observation, we exploit a structured regularization on the transportation plan to en…
▽ More
We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT). Specifically, a smoothed OT loss gives rise to an implicit transportation plan between the latent space and the data space. Based on this theoretical observation, we exploit a structured regularization on the transportation plan to encourage a prescribed latent subspace to be informative. This yields the formulation of a novel informative OT-based GAN. By convex duality, we obtain the equivalent view that this leads to perturbed ground costs favoring sparsity in the informative latent dimensions. Practically, we devise a stable training algorithm for the proposed informative GAN. Our experiments support the hypothesis that such regularizations effectively yield the discovery of disentangled and interpretable latent representations. Our work showcases potential power of a regularized OT framework in the context of generative modeling through its access to the transport plan. Further challenges are addressed in this line.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
Efficient Derivative Computation for Cumulative B-Splines on Lie Groups
Authors:
Christiane Sommer,
Vladyslav Usenko,
David Schubert,
Nikolaus Demmel,
Daniel Cremers
Abstract:
Continuous-time trajectory representation has recently gained popularity for tasks where the fusion of high-frame-rate sensors and multiple unsynchronized devices is required. Lie group cumulative B-splines are a popular way of representing continuous trajectories without singularities. They have been used in near real-time SLAM and odometry systems with IMU, LiDAR, regular, RGB-D and event camera…
▽ More
Continuous-time trajectory representation has recently gained popularity for tasks where the fusion of high-frame-rate sensors and multiple unsynchronized devices is required. Lie group cumulative B-splines are a popular way of representing continuous trajectories without singularities. They have been used in near real-time SLAM and odometry systems with IMU, LiDAR, regular, RGB-D and event cameras, as well as for offline calibration. These applications require efficient computation of time derivatives (velocity, acceleration), but all prior works rely on a computationally suboptimal formulation. In this work we present an alternative derivation of time derivatives based on recurrence relations that needs $\mathcal{O}(k)$ instead of $\mathcal{O}(k^2)$ matrix operations (for a spline of order $k$) and results in simple and elegant expressions. While producing the same result, the proposed approach significantly speeds up the trajectory optimization and allows for computing simple analytic derivatives with respect to spline knots. The results presented in this paper pave the way for incorporating continuous-time trajectory representations into more applications where real-time performance is required.
△ Less
Submitted 30 May, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
On the well-posedness of uncalibrated photometric stereo under general lighting
Authors:
Mohammed Brahimi,
Yvain Quéau,
Bjoern Haefner,
Daniel Cremers
Abstract:
Uncalibrated photometric stereo aims at estimating the 3D-shape of a surface, given a set of images captured from the same viewing angle, but under unknown, varying illumination. While the theoretical foundations of this inverse problem under directional lighting are well-established, there is a lack of mathematical evidence for the uniqueness of a solution under general lighting. On the other han…
▽ More
Uncalibrated photometric stereo aims at estimating the 3D-shape of a surface, given a set of images captured from the same viewing angle, but under unknown, varying illumination. While the theoretical foundations of this inverse problem under directional lighting are well-established, there is a lack of mathematical evidence for the uniqueness of a solution under general lighting. On the other hand, stable and accurate heuristical solutions of uncalibrated photometric stereo under such general lighting have recently been proposed. The quality of the results demonstrated therein tends to indicate that the problem may actually be well-posed, but this still has to be established. The present paper addresses this theoretical issue, considering first-order spherical harmonics approximation of general lighting. Two important theoretical results are established. First, the orthographic integrability constraint ensures uniqueness of a solution up to a global concave-convex ambiguity, which had already been conjectured, yet not proven. Second, the perspective integrability constraint makes the problem well-posed, which generalizes a previous result limited to directional lighting. Eventually, a closed-form expression for the unique least-squares solution of the problem under perspective projection is provided, allowing numerical simulations on synthetic data to empirically validate our findings.
△ Less
Submitted 16 September, 2020; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Rolling-Shutter Modelling for Direct Visual-Inertial Odometry
Authors:
David Schubert,
Nikolaus Demmel,
Lukas von Stumberg,
Vladyslav Usenko,
Daniel Cremers
Abstract:
We present a direct visual-inertial odometry (VIO) method which estimates the motion of the sensor setup and sparse 3D geometry of the environment based on measurements from a rolling-shutter camera and an inertial measurement unit (IMU).
The visual part of the system performs a photometric bundle adjustment on a sparse set of points. This direct approach does not extract feature points and is a…
▽ More
We present a direct visual-inertial odometry (VIO) method which estimates the motion of the sensor setup and sparse 3D geometry of the environment based on measurements from a rolling-shutter camera and an inertial measurement unit (IMU).
The visual part of the system performs a photometric bundle adjustment on a sparse set of points. This direct approach does not extract feature points and is able to track not only corners, but any pixels with sufficient gradient magnitude. Neglecting rolling-shutter effects in the visual part severely degrades accuracy and robustness of the system. In this paper, we incorporate a rolling-shutter model into the photometric bundle adjustment that estimates a set of recent keyframe poses and the inverse depth of a sparse set of points.
IMU information is accumulated between several frames using measurement preintegration, and is inserted into the optimization as an additional constraint between selected keyframes. For every keyframe we estimate not only the pose but also velocity and biases to correct the IMU measurements. Unlike systems with global-shutter cameras, we use both IMU measurements and rolling-shutter effects of the camera to estimate velocity and biases for every state.
Last, we evaluate our system on a novel dataset that contains global-shutter and rolling-shutter images, IMU data and ground-truth poses for ten different sequences, which we make publicly available. Evaluation shows that the proposed method outperforms a system where rolling shutter is not modelled and achieves similar accuracy to the global-shutter method on global-shutter data.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods
Authors:
Luca Della Libera,
Vladimir Golkov,
Yue Zhu,
Arman Mielke,
Daniel Cremers
Abstract:
Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/p…
▽ More
Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them.
△ Less
Submitted 22 November, 2021; v1 submitted 31 October, 2019;
originally announced October 2019.
-
Multi-Frame GAN: Image Enhancement for Stereo Visual Odometry in Low Light
Authors:
Eunah Jung,
Nan Yang,
Daniel Cremers
Abstract:
We propose the concept of a multi-frame GAN (MFGAN) and demonstrate its potential as an image sequence enhancement for stereo visual odometry in low light conditions. We base our method on an invertible adversarial network to transfer the beneficial features of brightly illuminated scenes to the sequence in poor illumination without costly paired datasets. In order to preserve the coherent geometr…
▽ More
We propose the concept of a multi-frame GAN (MFGAN) and demonstrate its potential as an image sequence enhancement for stereo visual odometry in low light conditions. We base our method on an invertible adversarial network to transfer the beneficial features of brightly illuminated scenes to the sequence in poor illumination without costly paired datasets. In order to preserve the coherent geometric cues for the translated sequence, we present a novel network architecture as well as a novel loss term combining temporal and stereo consistencies based on optical flow estimation. We demonstrate that the enhanced sequences improve the performance of state-of-the-art feature-based and direct stereo visual odometry methods on both synthetic and real datasets in challenging illumination. We also show that MFGAN outperforms other state-of-the-art image enhancement and style transfer methods by a large margin in terms of visual odometry.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Bregman Proximal Framework for Deep Linear Neural Networks
Authors:
Mahesh Chandra Mukkamala,
Felix Westerkamp,
Emanuel Laude,
Daniel Cremers,
Peter Ochs
Abstract:
A typical assumption for the analysis of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated, including loss functions in deep learning. To overcome this issue, certain extensions based on generalized proximity measures known as Bregman distances were introduced. This initiated…
▽ More
A typical assumption for the analysis of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated, including loss functions in deep learning. To overcome this issue, certain extensions based on generalized proximity measures known as Bregman distances were introduced. This initiated the development of the Bregman proximal gradient (BPG) algorithm and an inertial variant (momentum based) CoCaIn BPG, which however rely on problem dependent Bregman distances. In this paper, we develop Bregman distances for using BPG methods to train Deep Linear Neural Networks. The main implications of our results are strong convergence guarantees for these algorithms. We also propose several strategies for their efficient implementation, for example, closed form updates and a closed form expression for the inertial parameter of CoCaIn BPG. Moreover, the BPG method requires neither diminishing step sizes nor line search, unlike its corresponding Euclidean version. We numerically illustrate the competitiveness of the proposed methods compared to existing state of the art schemes.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Sparse Surface Constraints for Combining Physics-based Elasticity Simulation and Correspondence-Free Object Reconstruction
Authors:
Sebastian Weiss,
Robert Maier,
Rüdiger Westermann,
Daniel Cremers,
Nils Thuerey
Abstract:
We address the problem to infer physical material parameters and boundary conditions from the observed motion of a homogeneous deformable object via the solution of an inverse problem. Parameters are estimated from potentially unreliable real-world data sources such as sparse observations without correspondences. We introduce a novel Lagrangian-Eulerian optimization formulation, including a cost f…
▽ More
We address the problem to infer physical material parameters and boundary conditions from the observed motion of a homogeneous deformable object via the solution of an inverse problem. Parameters are estimated from potentially unreliable real-world data sources such as sparse observations without correspondences. We introduce a novel Lagrangian-Eulerian optimization formulation, including a cost function that penalizes differences to observations during an optimization run. This formulation matches correspondence-free, sparse observations from a single-view depth sequence with a finite element simulation of deformable bodies. In conjunction with an efficient hexahedral discretization and a stable, implicit formulation of collisions, our method can be used in demanding situation to recover a variety of material parameters, ranging from Young's modulus and Poisson ratio to gravity and stiffness dam**, and even external boundaries. In a number of tests using synthetic datasets and real-world measurements, we analyse the robustness of our approach and the convergence behavior of the numerical optimization scheme.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Lifting methods for manifold-valued variational problems
Authors:
Thomas Vogt,
Evgeny Strekalovskiy,
Daniel Cremers,
Jan Lellmann
Abstract:
Lifting methods allow to transform hard variational problems such as segmentation and optical flow estimation into convex problems in a suitable higher-dimensional space. The lifted models can then be efficiently solved to a global optimum, which allows to find approximate global minimizers of the original problem. Recently, these techniques have also been applied to problems with values in a mani…
▽ More
Lifting methods allow to transform hard variational problems such as segmentation and optical flow estimation into convex problems in a suitable higher-dimensional space. The lifted models can then be efficiently solved to a global optimum, which allows to find approximate global minimizers of the original problem. Recently, these techniques have also been applied to problems with values in a manifold. We provide a review of such methods in a refined framework based on a finite element discretization of the range, which extends the concept of sublabel-accurate lifting to manifolds. We also generalize existing methods for total variation regularization to support general convex regularization.
△ Less
Submitted 10 August, 2019;
originally announced August 2019.
-
Towards Generalizing Sensorimotor Control Across Weather Conditions
Authors:
Qadeer Khan,
Patrick Wenzel,
Daniel Cremers,
Laura Leal-Taixé
Abstract:
The ability of deep learning models to generalize well across different scenarios depends primarily on the quality and quantity of annotated data. Labeling large amounts of data for all possible scenarios that a model may encounter would not be feasible; if even possible. We propose a framework to deal with limited labeled training data and demonstrate it on the application of vision-based vehicle…
▽ More
The ability of deep learning models to generalize well across different scenarios depends primarily on the quality and quantity of annotated data. Labeling large amounts of data for all possible scenarios that a model may encounter would not be feasible; if even possible. We propose a framework to deal with limited labeled training data and demonstrate it on the application of vision-based vehicle control. We show how limited steering angle data available for only one condition can be transferred to multiple different weather scenarios. This is done by leveraging unlabeled images in a teacher-student learning paradigm complemented with an image-to-image translation network. The translation network transfers the images to a new domain, whereas the teacher provides soft supervised targets to train the student on this domain. Furthermore, we demonstrate how utilization of auxiliary networks can reduce the size of a model at inference time, without affecting the accuracy. The experiments show that our approach generalizes well across multiple different weather conditions using only ground truth labels from one domain.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Bregman Proximal Map**s and Bregman-Moreau Envelopes under Relative Prox-Regularity
Authors:
Emanuel Laude,
Peter Ochs,
Daniel Cremers
Abstract:
We systematically study the local single-valuedness of the Bregman proximal map** and local smoothness of the Bregman--Moreau envelope of a nonconvex function under relative prox-regularity - an extension of prox-regularity - which was originally introduced by Poliquin and Rockafellar. As Bregman distances are asymmetric in general, in accordance with Bauschke et al., it is natural to consider t…
▽ More
We systematically study the local single-valuedness of the Bregman proximal map** and local smoothness of the Bregman--Moreau envelope of a nonconvex function under relative prox-regularity - an extension of prox-regularity - which was originally introduced by Poliquin and Rockafellar. As Bregman distances are asymmetric in general, in accordance with Bauschke et al., it is natural to consider two variants of the Bregman proximal map**, which, depending on the order of the arguments, are called left and right Bregman proximal map**. We consider the left Bregman proximal map** first. Then, via translation result, we obtain analogue (and partially sharp) results for the right Bregman proximal map**. The class of relatively prox-regular functions significantly extends the recently considered class of relatively hypoconvex functions. In particular, relative prox-regularity allows for functions with a possibly nonconvex domain. Moreover, as a main source of examples and analogously to the classical setting, we introduce relatively amenable functions, i.e. convexly composite functions, for which the inner nonlinear map** is component-wise smooth adaptable, a recently introduced extension of Lipschitz differentiability. By way of example, we apply our theory to locally interpret joint alternating Bregman minimization with proximal regularization as a Bregman proximal gradient algorithm, applied to a smooth adaptable function.
△ Less
Submitted 31 January, 2020; v1 submitted 9 July, 2019;
originally announced July 2019.
-
CVPR19 Tracking and Detection Challenge: How crowded can it get?
Authors:
Patrick Dendorfer,
Hamid Rezatofighi,
Anton Milan,
Javen Shi,
Daniel Cremers,
Ian Reid,
Stefan Roth,
Konrad Schindler,
Laura Leal-Taixe
Abstract:
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research.
The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of…
▽ More
Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research.
The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of multiple object tracking methods. The challenge focuses on multiple people tracking, since pedestrians are well studied in the tracking community, and precise tracking and detection has high practical relevance. Since the first release, MOT15, MOT16 and MOT17 have tremendously contributed to the community by introducing a clean dataset and precise framework to benchmark multi-object trackers. In this paper, we present our CVPR19 benchmark, consisting of 8 new sequences depicting very crowded challenging scenes. The benchmark will be presented at the 4th BMTT MOT Challenge Workshop at the Computer Vision and Pattern Recognition Conference (CVPR) 2019, and will evaluate the state-of-the-art in multiple object tracking whend handling extremely crowded scenarios.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Smooth Shells: Multi-Scale Shape Registration with Functional Maps
Authors:
Marvin Eisenberger,
Zorah Lähner,
Daniel Cremers
Abstract:
We propose a novel 3D shape correspondence method based on the iterative alignment of so-called smooth shells. Smooth shells define a series of coarse-to-fine shape approximations designed to work well with multiscale algorithms. The main idea is to first align rough approximations of the geometry and then add more and more details to refine the correspondence. We fuse classical shape registration…
▽ More
We propose a novel 3D shape correspondence method based on the iterative alignment of so-called smooth shells. Smooth shells define a series of coarse-to-fine shape approximations designed to work well with multiscale algorithms. The main idea is to first align rough approximations of the geometry and then add more and more details to refine the correspondence. We fuse classical shape registration with Functional Maps by embedding the input shapes into an intrinsic-extrinsic product space. Moreover, we disambiguate intrinsic symmetries by applying a surrogate based Markov chain Monte Carlo initialization. Our method naturally handles various types of noise that commonly occur in real scans, like non-isometry or incompatible meshing. Finally, we demonstrate state-of-the-art quantitative results on several datasets and show that our pipeline produces smoother, more realistic results than other automatic matching methods in real world applications.
△ Less
Submitted 2 December, 2019; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Flat Metric Minimization with Applications in Generative Modeling
Authors:
Thomas Möllenhoff,
Daniel Cremers
Abstract:
We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-current…
▽ More
We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-currents) to $k$-currents, we can explicitly orient the data by attaching a $k$-dimensional tangent plane to each sample point. Based on the flat metric which is a fundamental distance between currents, we derive FlatGAN, a formulation in the spirit of generative adversarial networks but generalized to $k$-currents. In our theoretical contribution we prove that the flat metric between a parametrized current and a reference current is Lipschitz continuous in the parameters. In experiments, we show that the proposed shift to $k>0$ leads to interpretable and disentangled latent representations which behave equivariantly to the specified oriented tangent planes.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.
-
Learning to Evolve
Authors:
Jan Schuchardt,
Vladimir Golkov,
Daniel Cremers
Abstract:
Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombin…
▽ More
Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombine better than at random, improves the result of evolution in terms of fitness increase per generation and even in terms of attainable fitness. We use deep reinforcement learning to learn to dynamically adjust the strategy of evolutionary algorithms to varying circumstances. Our methods outperform classical evolutionary algorithms on combinatorial and continuous optimization problems.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Lifting Vectorial Variational Problems: A Natural Formulation based on Geometric Measure Theory and Discrete Exterior Calculus
Authors:
Thomas Möllenhoff,
Daniel Cremers
Abstract:
Numerous tasks in imaging and vision can be formulated as variational problems over vector-valued maps. We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents. To that end, we recall that functionals with polyconvex Lagrangians can be reparametrized as convex one-homogeneous functionals on the graph of the function. This leads t…
▽ More
Numerous tasks in imaging and vision can be formulated as variational problems over vector-valued maps. We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents. To that end, we recall that functionals with polyconvex Lagrangians can be reparametrized as convex one-homogeneous functionals on the graph of the function. This leads to an equivalent shape optimization problem over oriented surfaces in the product space of domain and codomain. A convex formulation is then obtained by relaxing the search space from oriented surfaces to more general currents. We propose a discretization of the resulting infinite-dimensional optimization problem using Whitney forms, which also generalizes recent "sublabel-accurate" multilabeling approaches.
△ Less
Submitted 2 May, 2019;
originally announced May 2019.
-
GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization
Authors:
Lukas von Stumberg,
Patrick Wenzel,
Qadeer Khan,
Daniel Cremers
Abstract:
Direct SLAM methods have shown exceptional performance on odometry tasks. However, they are susceptible to dynamic lighting and weather changes while also suffering from a bad initialization on large baselines. To overcome this, we propose GN-Net: a network optimized with the novel Gauss-Newton loss for training weather invariant deep features, tailored for direct image alignment. Our network can…
▽ More
Direct SLAM methods have shown exceptional performance on odometry tasks. However, they are susceptible to dynamic lighting and weather changes while also suffering from a bad initialization on large baselines. To overcome this, we propose GN-Net: a network optimized with the novel Gauss-Newton loss for training weather invariant deep features, tailored for direct image alignment. Our network can be trained with pixel correspondences between images taken from different sequences. Experiments on both simulated and real-world datasets demonstrate that our approach is more robust against bad initialization, variations in day-time, and weather changes thereby outperforming state-of-the-art direct and indirect methods. Furthermore, we release an evaluation benchmark for relocalization tracking against different types of weather. Our benchmark is available at https://vision.in.tum.de/gn-net.
△ Less
Submitted 27 November, 2019; v1 submitted 26 April, 2019;
originally announced April 2019.
-
DirectShape: Direct Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
Authors:
Rui Wang,
Nan Yang,
Joerg Stueckler,
Daniel Cremers
Abstract:
Scene understanding from images is a challenging problem encountered in autonomous driving. On the object level, while 2D methods have gradually evolved from computing simple bounding boxes to delivering finer grained results like instance segmentations, the 3D family is still dominated by estimating 3D bounding boxes. In this paper, we propose a novel approach to jointly infer the 3D rigid-body p…
▽ More
Scene understanding from images is a challenging problem encountered in autonomous driving. On the object level, while 2D methods have gradually evolved from computing simple bounding boxes to delivering finer grained results like instance segmentations, the 3D family is still dominated by estimating 3D bounding boxes. In this paper, we propose a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from a stereo image pair using shape priors. Unlike previous works that geometrically align shapes to point clouds from dense stereo reconstruction, our approach works directly on images by combining a photometric and a silhouette alignment term in the energy function. An adaptive sparse point selection scheme is proposed to efficiently measure the consistency with both terms. In experiments, we show superior performance of our method on 3D pose and shape estimation over the previous geometric approach and demonstrate that our method can also be applied as a refinement step and significantly boost the performances of several state-of-the-art deep learning based 3D object detectors. All related materials and demonstration videos are available at the project page https://vision.in.tum.de/research/vslam/direct-shape.
△ Less
Submitted 9 March, 2020; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Visual-Inertial Map** with Non-Linear Factor Recovery
Authors:
Vladyslav Usenko,
Nikolaus Demmel,
David Schubert,
Jörg Stückler,
Daniel Cremers
Abstract:
Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment map**. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent map**, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required.…
▽ More
Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment map**. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent map**, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required. Because of that, most systems operate on keyframes that have large time intervals between each other. Inertial data on the other hand quickly degrades with the duration of the intervals and after several seconds of integration, it typically contains only little useful information.
In this paper, we propose to extract relevant information for visual-inertial map** from visual-inertial odometry using non-linear factor recovery. We reconstruct a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO. To obtain a globally consistent map we combine these factors with loop-closing constraints using bundle adjustment. The VIO factors make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the map**. In experiments on a public benchmark, we demonstrate superior performance of our method over the state-of-the-art approaches.
△ Less
Submitted 30 May, 2020; v1 submitted 13 April, 2019;
originally announced April 2019.
-
Variational Uncalibrated Photometric Stereo under General Lighting
Authors:
Bjoern Haefner,
Zhenzhang Ye,
Maolin Gao,
Tao Wu,
Yvain Quéau,
Daniel Cremers
Abstract:
Photometric stereo (PS) techniques nowadays remain constrained to an ideal laboratory setup where modeling and calibration of lighting is amenable. To eliminate such restrictions, we propose an efficient principled variational approach to uncalibrated PS under general illumination. To this end, the Lambertian reflectance model is approximated through a spherical harmonic expansion, which preserves…
▽ More
Photometric stereo (PS) techniques nowadays remain constrained to an ideal laboratory setup where modeling and calibration of lighting is amenable. To eliminate such restrictions, we propose an efficient principled variational approach to uncalibrated PS under general illumination. To this end, the Lambertian reflectance model is approximated through a spherical harmonic expansion, which preserves the spatial invariance of the lighting. The joint recovery of shape, reflectance and illumination is then formulated as a single variational problem. There the shape estimation is carried out directly in terms of the underlying perspective depth map, thus implicitly ensuring integrability and bypassing the need for a subsequent normal integration. To tackle the resulting nonconvex problem numerically, we undertake a two-phase procedure to initialize a balloon-like perspective depth map, followed by a "lagged" block coordinate descent scheme. The experiments validate efficiency and robustness of this approach. Across a variety of evaluations, we are able to reduce the mean angular error consistently by a factor of 2-3 compared to the state-of-the-art.
△ Less
Submitted 27 August, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Controlling Neural Networks via Energy Dissipation
Authors:
Michael Moeller,
Thomas Möllenhoff,
Daniel Cremers
Abstract:
The last decade has shown a tremendous success in solving various computer vision problems with the help of deep learning techniques. Lately, many works have demonstrated that learning-based approaches with suitable network architectures even exhibit superior performance for the solution of (ill-posed) image reconstruction problems such as deblurring, super-resolution, or medical image reconstruct…
▽ More
The last decade has shown a tremendous success in solving various computer vision problems with the help of deep learning techniques. Lately, many works have demonstrated that learning-based approaches with suitable network architectures even exhibit superior performance for the solution of (ill-posed) image reconstruction problems such as deblurring, super-resolution, or medical image reconstruction. The drawback of purely learning-based methods, however, is that they cannot provide provable guarantees for the trained network to follow a given data formation process during inference. In this work we propose energy dissipating networks that iteratively compute a descent direction with respect to a given cost function or energy at the currently estimated reconstruction. Therefore, an adaptive step size rule such as a line-search, along with a suitable number of iterations can guarantee the reconstruction to follow a given data formation model encoded in the energy to arbitrary precision, and hence control the model's behavior even during test time. We prove that under standard assumptions, descent using the direction predicted by the network converges (linearly) to the global minimum of the energy. We illustrate the effectiveness of the proposed approach in experiments on single image super resolution and computed tomography (CT) reconstruction, and further illustrate extensions to convex feasibility problems.
△ Less
Submitted 20 August, 2019; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Optimization of Inf-Convolution Regularized Nonconvex Composite Problems
Authors:
Emanuel Laude,
Tao Wu,
Daniel Cremers
Abstract:
In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal map** and Moreau-envelope. In a convex setting such problems can be solved via alternating minimization of a splitting formulation, where the consensus constraint is penalized with a Legendre function. In contrast, for non…
▽ More
In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal map** and Moreau-envelope. In a convex setting such problems can be solved via alternating minimization of a splitting formulation, where the consensus constraint is penalized with a Legendre function. In contrast, for nonconvex models it is in general unclear that this approach yields stationary points to the infimal convolution problem. To this end we analytically investigate local regularity properties of the Moreau-envelope function under prox-regularity, which allows us to establish the equivalence between stationary points of the splitting model and the original inf-convolution model. We apply our theory to characterize stationary points of the penalty objective, which is minimized by the elastic averaging SGD (EASGD) method for distributed training. Numerically, we demonstrate the practical relevance of the proposed approach on the important task of distributed training of deep neural networks.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.