Search | arXiv e-print repository

arXiv:1902.01785 [pdf, other]

Homogeneous Linear Inequality Constraints for Neural Network Activations

Authors: Thomas Frerix, Matthias Nießner, Daniel Cremers

Abstract: We propose a method to impose homogeneous linear inequality constraints of the form $Ax\leq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One way to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By direc… ▽ More We propose a method to impose homogeneous linear inequality constraints of the form $Ax\leq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One way to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By directly incorporating the constraints into the architecture, we can significantly speed-up inference at test time; for instance, our experiments show a speed-up of up to two orders of magnitude over a projection method. Our algorithm computes a suitable parameterization of the feasible set at initialization and uses standard variants of stochastic gradient descent to find solutions to the constrained network. Thus, the modeling constraints are always satisfied during training. Crucially, our approach avoids to solve an optimization problem at each training step or to manually trade-off data and constraint fidelity with additional hyperparameters. We consider constrained generative modeling as an important application domain and experimentally demonstrate the proposed method by constraining a variational autoencoder. △ Less

Submitted 28 May, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: CVPR 2020 DeepVision Workshop

arXiv:1902.00057 [pdf, other]

Probabilistic Discriminative Learning with Layered Graphical Models

Authors: Yuesong Shen, Tao Wu, Csaba Domokos, Daniel Cremers

Abstract: Probabilistic graphical models are traditionally known for their successes in generative modeling. In this work, we advocate layered graphical models (LGMs) for probabilistic discriminative learning. To this end, we design LGMs in close analogy to neural networks (NNs), that is, they have deep hierarchical structures and convolutional or local connections between layers. Equipped with tensorized t… ▽ More Probabilistic graphical models are traditionally known for their successes in generative modeling. In this work, we advocate layered graphical models (LGMs) for probabilistic discriminative learning. To this end, we design LGMs in close analogy to neural networks (NNs), that is, they have deep hierarchical structures and convolutional or local connections between layers. Equipped with tensorized truncated variational inference, our LGMs can be efficiently trained via backpropagation on mainstream deep learning frameworks such as PyTorch. To deal with continuous valued inputs, we use a simple yet effective soft-clam** strategy for efficient inference. Through extensive experiments on image classification over MNIST and FashionMNIST datasets, we demonstrate that LGMs are capable of achieving competitive results comparable to NNs of similar architectures, while preserving transparent probabilistic modeling. △ Less

Submitted 31 January, 2019; originally announced February 2019.

arXiv:1809.10097 [pdf, other]

Photometric Depth Super-Resolution

Authors: Bjoern Haefner, Songyou Peng, Alok Verma, Yvain Quéau, Daniel Cremers

Abstract: This study explores the use of photometric techniques (shape-from-shading and uncalibrated photometric stereo) for upsampling the low-resolution depth map from an RGB-D sensor to the higher resolution of the companion RGB image. A single-shot variational approach is first put forward, which is effective as long as the target's reflectance is piecewise-constant. It is then shown that this dependenc… ▽ More This study explores the use of photometric techniques (shape-from-shading and uncalibrated photometric stereo) for upsampling the low-resolution depth map from an RGB-D sensor to the higher resolution of the companion RGB image. A single-shot variational approach is first put forward, which is effective as long as the target's reflectance is piecewise-constant. It is then shown that this dependency upon a specific reflectance model can be relaxed by focusing on a specific class of objects (e.g., faces), and delegate reflectance estimation to a deep neural network. A multi-shot strategy based on randomly varying lighting conditions is eventually discussed. It requires no training or prior on the reflectance, yet this comes at the price of a dedicated acquisition setup. Both quantitative and qualitative evaluations illustrate the effectiveness of the proposed methods on synthetic and real-world scenarios. △ Less

Submitted 25 June, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2019. First three authors contribute equally

arXiv:1808.03417 [pdf, other]

DeepWrinkles: Accurate and Realistic Clothing Modeling

Authors: Zorah Laehner, Daniel Cremers, Tony Tung

Abstract: We present a novel method to generate accurate and realistic clothing deformation from real data capture. Previous methods for realistic cloth modeling mainly rely on intensive computation of physics-based simulation (with numerous heuristic parameters), while models reconstructed from visual observations typically suffer from lack of geometric details. Here, we propose an original framework consi… ▽ More We present a novel method to generate accurate and realistic clothing deformation from real data capture. Previous methods for realistic cloth modeling mainly rely on intensive computation of physics-based simulation (with numerous heuristic parameters), while models reconstructed from visual observations typically suffer from lack of geometric details. Here, we propose an original framework consisting of two modules that work jointly to represent global shape deformation as well as surface details with high fidelity. Global shape deformations are recovered from a subspace model learned from 3D data of clothed people in motion, while high frequency details are added to normal maps created using a conditional Generative Adversarial Network whose architecture is designed to enforce realism and temporal consistency. This leads to unprecedented high-quality rendering of clothing deformation sequences, where fine wrinkles from (real) high resolution observations can be recovered. In addition, as the model is learned independently from body shape and pose, the framework is suitable for applications that require retargeting (e.g., body animation). Our experiments show original high quality results with a flexible model. We claim an entirely data-driven approach to realistic cloth wrinkle generation is possible. △ Less

Submitted 10 August, 2018; originally announced August 2018.

Comments: 18 pages, 12 figures, 15th European Conference on Computer Vision (ECCV) 2018, Oral Presentation

arXiv:1808.02775 [pdf, other]

doi 10.1109/LRA.2018.2855443

Omnidirectional DSO: Direct Sparse Odometry with Fisheye Cameras

Authors: Hidenobu Matsuki, Lukas von Stumberg, Vladyslav Usenko, Jörg Stückler, Daniel Cremers

Abstract: We propose a novel real-time direct monocular visual odometry for omnidirectional cameras. Our method extends direct sparse odometry (DSO) by using the unified omnidirectional model as a projection function, which can be applied to fisheye cameras with a field-of-view (FoV) well above 180 degrees. This formulation allows for using the full area of the input image even with strong distortion, while… ▽ More We propose a novel real-time direct monocular visual odometry for omnidirectional cameras. Our method extends direct sparse odometry (DSO) by using the unified omnidirectional model as a projection function, which can be applied to fisheye cameras with a field-of-view (FoV) well above 180 degrees. This formulation allows for using the full area of the input image even with strong distortion, while most existing visual odometry methods can only use a rectified and cropped part of it. Model parameters within an active keyframe window are jointly optimized, including the intrinsic/extrinsic camera parameters, 3D position of points, and affine brightness parameters. Thanks to the wide FoV, image overlap between frames becomes bigger and points are more spatially distributed. Our results demonstrate that our method provides increased accuracy and robustness over state-of-the-art visual odometry algorithms. △ Less

Submitted 8 August, 2018; originally announced August 2018.

Comments: Accepted by IEEE Robotics and Automation Letters (RA-L), 2018 and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018

arXiv:1808.01834 [pdf, other]

Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform

Authors: Lingni Ma, Jörg Stückler, Tao Wu, Daniel Cremers

Abstract: Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs). Many state-of-the-art approaches either tackle the loss of high-resolution information due to pooling in the encoder stage, or use dilated convolutions or high-resolution lanes to maintain detailed feature maps and predictions. Motivated by the structural analogy betwe… ▽ More Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs). Many state-of-the-art approaches either tackle the loss of high-resolution information due to pooling in the encoder stage, or use dilated convolutions or high-resolution lanes to maintain detailed feature maps and predictions. Motivated by the structural analogy between multi-resolution wavelet analysis and the pooling/unpooling layers of CNNs, we introduce discrete wavelet transform (DWT) into the CNN encoder-decoder architecture and propose WCNN. The high-frequency wavelet coefficients are computed at encoder, which are later used at the decoder to unpooled jointly with coarse-resolution feature maps through the inverse DWT. The DWT/iDWT is further used to develop two wavelet pyramids to capture the global context, where the multi-resolution DWT is applied to successively reduce the spatial resolution and increase the receptive field. Experiment with the Cityscape dataset, the proposed WCNNs are computationally efficient and yield improvements the accuracy for high-resolution dense pixelwise prediction. △ Less

Submitted 6 August, 2018; originally announced August 2018.

Comments: This work was first submitted to NIPS 2017, May 2017

arXiv:1808.01111 [pdf, other]

LDSO: Direct Sparse Odometry with Loop Closure

Authors: Xiang Gao, Rui Wang, Nikolaus Demmel, Daniel Cremers

Abstract: In this paper we present an extension of Direct Sparse Odometry (DSO) to a monocular visual SLAM system with loop closure detection and pose-graph optimization (LDSO). As a direct technique, DSO can utilize any image pixel with sufficient intensity gradient, which makes it robust even in featureless areas. LDSO retains this robustness, while at the same time ensuring repeatability of some of these… ▽ More In this paper we present an extension of Direct Sparse Odometry (DSO) to a monocular visual SLAM system with loop closure detection and pose-graph optimization (LDSO). As a direct technique, DSO can utilize any image pixel with sufficient intensity gradient, which makes it robust even in featureless areas. LDSO retains this robustness, while at the same time ensuring repeatability of some of these points by favoring corner features in the tracking frontend. This repeatability allows to reliably detect loop closure candidates with a conventional feature-based bag-of-words (BoW) approach. Loop closure candidates are verified geometrically and Sim(3) relative pose constraints are estimated by jointly minimizing 2D and 3D geometric error terms. These constraints are fused with a co-visibility graph of relative poses extracted from DSO's sliding window optimization. Our evaluation on publicly available datasets demonstrates that the modified point selection strategy retains the tracking accuracy and robustness, and the integrated pose-graph optimization significantly reduces the accumulated rotation-, translation- and scale-drift, resulting in an overall performance comparable to state-of-the-art feature-based systems, even without global bundle adjustment. △ Less

Submitted 3 August, 2018; originally announced August 2018.

arXiv:1808.00558 [pdf, other]

doi 10.1007/978-3-030-01237-3_42

Direct Sparse Odometry with Rolling Shutter

Authors: David Schubert, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, Daniel Cremers

Abstract: Neglecting the effects of rolling-shutter cameras for visual odometry (VO) severely degrades accuracy and robustness. In this paper, we propose a novel direct monocular VO method that incorporates a rolling-shutter model. Our approach extends direct sparse odometry which performs direct bundle adjustment of a set of recent keyframe poses and the depths of a sparse set of image points. We estimate… ▽ More Neglecting the effects of rolling-shutter cameras for visual odometry (VO) severely degrades accuracy and robustness. In this paper, we propose a novel direct monocular VO method that incorporates a rolling-shutter model. Our approach extends direct sparse odometry which performs direct bundle adjustment of a set of recent keyframe poses and the depths of a sparse set of image points. We estimate the velocity at each keyframe and impose a constant-velocity prior for the optimization. In this way, we obtain a near real-time, accurate direct VO method. Our approach achieves improved results on challenging rolling-shutter sequences over state-of-the-art global-shutter VO. △ Less

Submitted 1 August, 2018; originally announced August 2018.

arXiv:1807.08957 [pdf, other]

doi 10.1109/3DV.2018.00069

The Double Sphere Camera Model

Authors: Vladyslav Usenko, Nikolaus Demmel, Daniel Cremers

Abstract: Vision-based motion estimation and 3D reconstruction, which have numerous applications (e.g., autonomous driving, navigation systems for airborne devices and augmented reality) are receiving significant research attention. To increase the accuracy and robustness, several researchers have recently demonstrated the benefit of using large field-of-view cameras for such applications. In this paper, we… ▽ More Vision-based motion estimation and 3D reconstruction, which have numerous applications (e.g., autonomous driving, navigation systems for airborne devices and augmented reality) are receiving significant research attention. To increase the accuracy and robustness, several researchers have recently demonstrated the benefit of using large field-of-view cameras for such applications. In this paper, we provide an extensive review of existing models for large field-of-view cameras. For each model we provide projection and unprojection functions and the subspace of points that result in valid projection. Then, we propose the Double Sphere camera model that well fits with large field-of-view lenses, is computationally inexpensive and has a closed-form inverse. We evaluate the model using a calibration dataset with several different lenses and compare the models using the metrics that are relevant for Visual Odometry, i.e., reprojection error, as well as computation time for projection and unprojection functions and their Jacobians. We also provide qualitative results and discuss the performance of all models. △ Less

Submitted 29 October, 2018; v1 submitted 24 July, 2018; originally announced July 2018.

arXiv:1807.02570 [pdf, other]

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

Authors: Nan Yang, Rui Wang, Jörg Stückler, Daniel Cremers

Abstract: Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Dir… ▽ More Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera. △ Less

Submitted 25 July, 2018; v1 submitted 6 July, 2018; originally announced July 2018.

Comments: To appear in ECCV 2018, Munich. 17 pages including references, 7 figures, 4 tables. Supplementary material: https://vision.in.tum.de/members/yangn

arXiv:1807.02087 [pdf, other]

doi 10.1109/TPAMI.2018.2884990

A Region-based Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking

Authors: Henning Tjaden, Ulrich Schwanecke, Elmar Schömer, Daniel Cremers

Abstract: We propose an algorithm for real-time 6DOF pose tracking of rigid 3D objects using a monocular RGB camera. The key idea is to derive a region-based cost function using temporally consistent local color histograms. While such region-based cost functions are commonly optimized using first-order gradient descent techniques, we systematically derive a Gauss-Newton optimization scheme which gives rise… ▽ More We propose an algorithm for real-time 6DOF pose tracking of rigid 3D objects using a monocular RGB camera. The key idea is to derive a region-based cost function using temporally consistent local color histograms. While such region-based cost functions are commonly optimized using first-order gradient descent techniques, we systematically derive a Gauss-Newton optimization scheme which gives rise to drastically faster convergence and highly accurate and robust tracking performance. We furthermore propose a novel complex dataset dedicated for the task of monocular object pose tracking and make it publicly available to the community. To our knowledge, it is the first to address the common and important scenario in which both the camera as well as the objects are moving simultaneously in cluttered scenes. In numerous experiments - including our own proposed dataset - we demonstrate that the proposed Gauss-Newton approach outperforms existing approaches, in particular in the presence of cluttered backgrounds, heterogeneous objects and partial occlusions. △ Less

Submitted 19 December, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

arXiv:1807.01001 [pdf, other]

Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs

Authors: Patrick Wenzel, Qadeer Khan, Daniel Cremers, Laura Leal-Taixé

Abstract: Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground tr… ▽ More Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground truth data. To this end, we propose to divide the task of vehicle control into two independent modules: a control module which is only trained on one weather condition for which labeled steering data is available, and a perception module which is used as an interface between new weather conditions and the fixed control module. To generate the semantic data needed to train the perception module, we propose to use a generative adversarial network (GAN)-based model to retrieve the semantic information for the new conditions in an unsupervised manner. We introduce a master-servant architecture, where the master model (semantic labels available) trains the servant model (semantic labels not available). We show that our proposed method trained with ground truth data for a single weather condition is capable of achieving similar results on the task of steering angle prediction as an end-to-end model trained with ground truth data of 15 different weather conditions. △ Less

Submitted 1 October, 2018; v1 submitted 3 July, 2018; originally announced July 2018.

Comments: 2nd Conference on Robot Learning (CoRL 2018), Zürich, Switzerland

arXiv:1806.10417 [pdf, other]

Divergence-Free Shape Interpolation and Correspondence

Authors: Marvin Eisenberger, Zorah Lähner, Daniel Cremers

Abstract: We present a novel method to model and calculate deformation fields between shapes embedded in $\mathbb{R}^D$. Our framework combines naturally interpolating the two input shapes and calculating correspondences at the same time. The key idea is to compute a divergence-free deformation field represented in a coarse-to-fine basis using the Karhunen-Loève expansion. The advantages are that there is n… ▽ More We present a novel method to model and calculate deformation fields between shapes embedded in $\mathbb{R}^D$. Our framework combines naturally interpolating the two input shapes and calculating correspondences at the same time. The key idea is to compute a divergence-free deformation field represented in a coarse-to-fine basis using the Karhunen-Loève expansion. The advantages are that there is no need to discretize the embedding space and the deformation is volume-preserving. Furthermore, the optimization is done on downsampled versions of the shapes but the morphing can be applied to any resolution without a heavy increase in complexity. We show results for shape correspondence, registration, inter- and extrapolation on the TOSCA and FAUST data sets. △ Less

Submitted 16 October, 2018; v1 submitted 27 June, 2018; originally announced June 2018.

arXiv:1806.02997 [pdf, other]

q-Space Novelty Detection with Variational Autoencoders

Authors: Aleksei Vasilev, Vladimir Golkov, Marc Meissner, Ilona Lipp, Eleonora Sgarlata, Valentina Tomassini, Derek K. Jones, Daniel Cremers

Abstract: In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we… ▽ More In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g. sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (q-space) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e. to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods. We also evaluate the proposed methods on the MNIST handwritten digits dataset and show that many of them are able to outperform the state of the art. △ Less

Submitted 25 October, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: 11 pages, 2 figures

MSC Class: 62F15; 62G07; 62M45; 68T30 ACM Class: G.3; H.3.3; I.2.4; I.2.6; I.4.6; I.5; I.5.4; J.3

arXiv:1805.00613 [pdf, other]

Deep Perm-Set Net: Learn to predict sets with unknown permutation and cardinality using deep neural networks

Authors: S. Hamid Rezatofighi, Roman Kaskman, Farbod T. Motlagh, Qinfeng Shi, Daniel Cremers, Laura Leal-Taixé, Ian Reid

Abstract: Many real-world problems, e.g. object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural networks which naturally deal with structured outputs such as vectors, matrices or tensors. We present a novel approach for learning to predict sets with unknown permutation and cardinality using deep neural networks. Specifically, in… ▽ More Many real-world problems, e.g. object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural networks which naturally deal with structured outputs such as vectors, matrices or tensors. We present a novel approach for learning to predict sets with unknown permutation and cardinality using deep neural networks. Specifically, in our formulation we incorporate the permutation as unobservable variable and estimate its distribution during the learning process using alternating optimization. We demonstrate the validity of this new formulation on two relevant vision problems: object detection, for which our formulation outperforms state-of-the-art detectors such as Faster R-CNN and YOLO, and a complex CAPTCHA test, where we observe that, surprisingly, our set based network acquired the ability of mimicking arithmetics without any rules being coded. △ Less

Submitted 2 October, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

arXiv:1804.06120 [pdf, other]

doi 10.1109/IROS.2018.8593419

The TUM VI Benchmark for Evaluating Visual-Inertial Odometry

Authors: David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, Daniel Cremers

Abstract: Visual odometry and SLAM methods have a large variety of applications in domains such as augmented reality or robotics. Complementing vision sensors with inertial measurements tremendously improves tracking accuracy and robustness, and thus has spawned large interest in the development of visual-inertial (VI) odometry approaches. In this paper, we propose the TUM VI benchmark, a novel dataset with… ▽ More Visual odometry and SLAM methods have a large variety of applications in domains such as augmented reality or robotics. Complementing vision sensors with inertial measurements tremendously improves tracking accuracy and robustness, and thus has spawned large interest in the development of visual-inertial (VI) odometry approaches. In this paper, we propose the TUM VI benchmark, a novel dataset with a diverse set of sequences in different scenes for evaluating VI odometry. It provides camera images with 1024x1024 resolution at 20 Hz, high dynamic range and photometric calibration. An IMU measures accelerations and angular velocities on 3 axes at 200 Hz, while the cameras and IMU sensors are time-synchronized in hardware. For trajectory evaluation, we also provide accurate pose ground truth from a motion capture system at high frequency (120 Hz) at the start and end of the sequences which we accurately aligned with the camera and IMU measurements. The full dataset with raw and calibrated data is publicly available. We also evaluate state-of-the-art VI odometry approaches on our dataset. △ Less

Submitted 9 March, 2020; v1 submitted 17 April, 2018; originally announced April 2018.

Comments: Updates compared to previous version include additional evaluations and DOI

arXiv:1804.05625 [pdf, other]

doi 10.1109/ICRA.2018.8462905

Direct Sparse Visual-Inertial Odometry using Dynamic Marginalization

Authors: Lukas von Stumberg, Vladyslav Usenko, Daniel Cremers

Abstract: We present VI-DSO, a novel approach for visual-inertial odometry, which jointly estimates camera poses and sparse scene geometry by minimizing photometric and IMU measurement errors in a combined energy functional. The visual part of the system performs a bundle-adjustment like optimization on a sparse set of points, but unlike key-point based systems it directly minimizes a photometric error. Thi… ▽ More We present VI-DSO, a novel approach for visual-inertial odometry, which jointly estimates camera poses and sparse scene geometry by minimizing photometric and IMU measurement errors in a combined energy functional. The visual part of the system performs a bundle-adjustment like optimization on a sparse set of points, but unlike key-point based systems it directly minimizes a photometric error. This makes it possible for the system to track not only corners, but any pixels with large enough intensity gradients. IMU information is accumulated between several frames using measurement preintegration, and is inserted into the optimization as an additional constraint between keyframes. We explicitly include scale and gravity direction into our model and jointly optimize them together with other variables such as poses. As the scale is often not immediately observable using IMU data this allows us to initialize our visual-inertial system with an arbitrary scale instead of having to delay the initialization until everything is observable. We perform partial marginalization of old variables so that updates can be computed in a reasonable time. In order to keep the system consistent we propose a novel strategy which we call "dynamic marginalization". This technique allows us to use partial marginalization even in cases where the initial scale estimate is far from the optimum. We evaluate our method on the challenging EuRoC dataset, showing that VI-DSO outperforms the state of the art. △ Less

Submitted 16 April, 2018; originally announced April 2018.

MSC Class: 68T45

arXiv:1803.02487

A Nonlinear Bregman Primal-Dual Framework for Optimizing Nonconvex Infimal Convolutions

Authors: Emanuel Laude, Daniel Cremers

Abstract: This work is concerned with the optimization of nonconvex, nonsmooth composite optimization problems, whose objective is a composition of a nonlinear map** and a nonsmooth nonconvex function, that can be written as an infimal convolution (inf-conv). To tackle this problem class we propose to reformulate the problem exploiting its inf-conv structure and derive a block coordinate descent scheme on… ▽ More This work is concerned with the optimization of nonconvex, nonsmooth composite optimization problems, whose objective is a composition of a nonlinear map** and a nonsmooth nonconvex function, that can be written as an infimal convolution (inf-conv). To tackle this problem class we propose to reformulate the problem exploiting its inf-conv structure and derive a block coordinate descent scheme on a Bregman augmented Lagrangian, that can be implemented asynchronously. We prove convergence of our scheme to stationary points of the original model for a specific choice of the penalty parameter. Our framework includes various existing algorithms as special cases, such as DC-programming, $k$-means clustering and ADMM with nonlinear operator, when a specific objective function and inf-conv decomposition (inf-deconvolution) is chosen. In illustrative experiments, we provide evidence, that our algorithm behaves favorably in terms of low objective value and robustness towards initialization, when compared to DC-programming or the $k$-means algorithm. △ Less

Submitted 27 March, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: We withdraw the pre-print as it contains some technical flaws that we hope to resolve in the future

arXiv:1801.07648 [pdf, other]

Clustering with Deep Learning: Taxonomy and New Methods

Authors: Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel, Daniel Cremers

Abstract: Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxon… ▽ More Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxonomy enables researchers and practitioners to systematically create new clustering methods by selectively recombining and replacing distinct aspects of previous methods with the goal of overcoming their individual limitations. The experimental evaluation confirms this and shows that the method created for the case study achieves state-of-the-art clustering quality and surpasses it in some cases. △ Less

Submitted 13 September, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

MSC Class: 62H30; 62M45; 91C20 ACM Class: H.3.3; I.2.6; I.5; I.5.3; I.5.4

arXiv:1801.06397 [pdf, other]

doi 10.1007/s11263-018-1082-6

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Authors: Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method… ▽ More The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks.We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process. △ Less

Submitted 22 March, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

Comments: added references (UCL dataset); added IJCV copyright information

arXiv:1801.05413 [pdf, other]

Combinatorial Preconditioners for Proximal Algorithms on Graphs

Authors: Thomas Möllenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers

Abstract: We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners. Such combinatorial preconditioners arise from partitioning the graph into forests. We prove that certain decompositions lead to a theoretically optimal condition number. We also show how ideal decompositions can be realized using matroid partitionin… ▽ More We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners. Such combinatorial preconditioners arise from partitioning the graph into forests. We prove that certain decompositions lead to a theoretically optimal condition number. We also show how ideal decompositions can be realized using matroid partitioning and propose efficient greedy variants thereof for large-scale problems. Coupled with specialized solvers for the resulting scaled proximal subproblems, the preconditioned algorithm achieves competitive performance in machine learning and vision applications. △ Less

Submitted 21 February, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: Published as a conference paper at AISTATS 2018

arXiv:1711.10824 [pdf, other]

Compression for Smooth Shape Analysis

Authors: V. Estellers, F. R. Schmidt, D. Cremers

Abstract: Most 3D shape analysis methods use triangular meshes to discretize both the shape and functions on it as piecewise linear functions. With this representation, shape analysis requires fine meshes to represent smooth shapes and geometric operators like normals, curvatures, or Laplace-Beltrami eigenfunctions at large computational and memory costs. We avoid this bottleneck with a compression techni… ▽ More Most 3D shape analysis methods use triangular meshes to discretize both the shape and functions on it as piecewise linear functions. With this representation, shape analysis requires fine meshes to represent smooth shapes and geometric operators like normals, curvatures, or Laplace-Beltrami eigenfunctions at large computational and memory costs. We avoid this bottleneck with a compression technique that represents a smooth shape as subdivision surfaces and exploits the subdivision scheme to parametrize smooth functions on that shape with a few control parameters. This compression does not affect the accuracy of the Laplace-Beltrami operator and its eigenfunctions and allow us to compute shape descriptors and shape matchings at an accuracy comparable to triangular meshes but a fraction of the computational cost. Our framework can also compress surfaces represented by point clouds to do shape analysis of 3D scanning data. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1710.10686 [pdf, ps, other]

Regularization for Deep Learning: A Taxonomy

Authors: Jan Kukačka, Vladimir Golkov, Daniel Cremers

Abstract: Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization proc… ▽ More Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods. △ Less

Submitted 29 October, 2017; originally announced October 2017.

MSC Class: 62M45 ACM Class: I.2.6; I.5

arXiv:1710.06623 [pdf, other]

A Nonconvex Proximal Splitting Algorithm under Moreau-Yosida Regularization

Authors: Emanuel Laude, Tao Wu, Daniel Cremers

Abstract: We tackle highly nonconvex, nonsmooth composite optimization problems whose objectives comprise a Moreau-Yosida regularized term. Classical nonconvex proximal splitting algorithms, such as nonconvex ADMM, suffer from lack of convergence for such a problem class. To overcome this difficulty, in this work we consider a lifted variant of the Moreau-Yosida regularized model and propose a novel multibl… ▽ More We tackle highly nonconvex, nonsmooth composite optimization problems whose objectives comprise a Moreau-Yosida regularized term. Classical nonconvex proximal splitting algorithms, such as nonconvex ADMM, suffer from lack of convergence for such a problem class. To overcome this difficulty, in this work we consider a lifted variant of the Moreau-Yosida regularized model and propose a novel multiblock primal-dual algorithm that intrinsically stabilizes the dual block. We provide a complete convergence analysis of our algorithm and identify respective optimality qualifications under which stationarity of the original model is retrieved at convergence. Numerically, we demonstrate the relevance of Moreau-Yosida regularized models and the efficiency of our algorithm on robust regression as well as joint feature selection and semi-supervised learning. △ Less

Submitted 26 February, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS) 2018

arXiv:1710.02081 [pdf, other]

Online Photometric Calibration for Auto Exposure Video for Realtime Visual Odometry and SLAM

Authors: Paul Bergmann, Rui Wang, Daniel Cremers

Abstract: Recent direct visual odometry and SLAM algorithms have demonstrated impressive levels of precision. However, they require a photometric camera calibration in order to achieve competitive results. Hence, the respective algorithm cannot be directly applied to an off-the-shelf-camera or to a video sequence acquired with an unknown camera. In this work we propose a method for online photometric calibr… ▽ More Recent direct visual odometry and SLAM algorithms have demonstrated impressive levels of precision. However, they require a photometric camera calibration in order to achieve competitive results. Hence, the respective algorithm cannot be directly applied to an off-the-shelf-camera or to a video sequence acquired with an unknown camera. In this work we propose a method for online photometric calibration which enables to process auto exposure videos with visual odometry precisions that are on par with those of photometrically calibrated videos. Our algorithm recovers the exposure times of consecutive frames, the camera response function, and the attenuation factors of the sensor irradiance due to vignetting. Gain robust KLT feature tracks are used to obtain scene point correspondences as input to a nonlinear optimization framework. We show that our approach can reliably calibrate arbitrary video sequences by evaluating it on datasets for which full photometric ground truth is available. We further show that our calibration can improve the performance of a state-of-the-art direct visual odometry method that works solely on pixel intensities, calibrating for photometric parameters in an online fashion in realtime. △ Less

Submitted 5 October, 2017; originally announced October 2017.

Comments: 7 pages

arXiv:1709.10354 [pdf, other]

A Variational Approach to Shape-from-shading Under Natural Illumination

Authors: Yvain Quéau, Jean Mélou, Fabien Castan, Daniel Cremers, Jean-Denis Durou

Abstract: A numerical solution to shape-from-shading under natural illumination is presented. It builds upon an augmented Lagrangian approach for solving a generic PDE-based shape-from-shading model which handles directional or spherical harmonic lighting, orthographic or perspective projection, and greylevel or multi-channel images. Real-world applications to shading-aware depth map denoising, refinement a… ▽ More A numerical solution to shape-from-shading under natural illumination is presented. It builds upon an augmented Lagrangian approach for solving a generic PDE-based shape-from-shading model which handles directional or spherical harmonic lighting, orthographic or perspective projection, and greylevel or multi-channel images. Real-world applications to shading-aware depth map denoising, refinement and completion are presented. △ Less

Submitted 3 December, 2017; v1 submitted 29 September, 2017; originally announced September 2017.

Comments: Presented at EMMCVPR 2017 conference

arXiv:1709.08378 [pdf, other]

Variational Reflectance Estimation from Multi-view Images

Authors: Jean Mélou, Yvain Quéau, Jean-Denis Durou, Fabien Castan, Daniel Cremers

Abstract: We tackle the problem of reflectance estimation from a set of multi-view images, assuming known geometry. The approach we put forward turns the input images into reflectance maps, through a robust variational method. The variational model comprises an image-driven fidelity term and a term which enforces consistency of the reflectance estimates with respect to each view. If illumination is fixed ac… ▽ More We tackle the problem of reflectance estimation from a set of multi-view images, assuming known geometry. The approach we put forward turns the input images into reflectance maps, through a robust variational method. The variational model comprises an image-driven fidelity term and a term which enforces consistency of the reflectance estimates with respect to each view. If illumination is fixed across the views, then reflectance estimation remains under-constrained: a regularization term, which ensures piecewise-smoothness of the reflectance, is thus used. Reflectance is parameterized in the image domain, rather than on the surface, which makes the numerical solution much easier, by resorting to an alternating majorization-minimization approach. Experiments on both synthetic and real datasets are carried out to validate the proposed strategy. △ Less

Submitted 23 January, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

arXiv:1709.06031 [pdf, other]

Video Object Segmentation Without Temporal Information

Authors: Kevis-Kokitsi Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, Luc Van Gool

Abstract: Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce a… ▽ More Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce any result at all. This paper explores the orthogonal approach of processing each frame independently, i.e disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS-S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent video segmentation databases, which show that OSVOS-S is both the fastest and most accurate method in the state of the art. △ Less

Submitted 16 May, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: Accepted to T-PAMI. Extended version of "One-Shot Video Object Segmentation", CVPR 2017 (arXiv:1611.05198). Project page: http://www.vision.ee.ethz.ch/~cvlsegmentation/osvos/

arXiv:1709.03763 [pdf, other]

Efficient Online Surface Correction for Real-time Large-Scale 3D Reconstruction

Authors: Robert Maier, Raphael Schaller, Daniel Cremers

Abstract: State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors usually reduce drift in camera tracking by globally optimizing the estimated camera poses in real-time without simultaneously updating the reconstructed surface on pose changes. We propose an efficient on-the-fly surface correction method for globally consistent dense 3D reconstruction of large-scale scenes. Our approach… ▽ More State-of-the-art methods for large-scale 3D reconstruction from RGB-D sensors usually reduce drift in camera tracking by globally optimizing the estimated camera poses in real-time without simultaneously updating the reconstructed surface on pose changes. We propose an efficient on-the-fly surface correction method for globally consistent dense 3D reconstruction of large-scale scenes. Our approach uses a dense Visual RGB-D SLAM system that estimates the camera motion in real-time on a CPU and refines it in a global pose graph optimization. Consecutive RGB-D frames are locally fused into keyframes, which are incorporated into a sparse voxel hashed Signed Distance Field (SDF) on the GPU. On pose graph updates, the SDF volume is corrected on-the-fly using a novel keyframe re-integration strategy with reduced GPU-host streaming. We demonstrate in an extensive quantitative evaluation that our method is up to 93% more runtime efficient compared to the state-of-the-art and requires significantly less memory, with only negligible loss of surface quality. Overall, our system requires only a single GPU and allows for real-time surface correction of large environments. △ Less

Submitted 12 September, 2017; originally announced September 2017.

Comments: British Machine Vision Conference (BMVC), London, September 2017

arXiv:1708.07878 [pdf, other]

Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras

Authors: Rui Wang, Martin Schwörer, Daniel Cremers

Abstract: We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a nove… ▽ More We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pixels uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods. △ Less

Submitted 25 August, 2017; originally announced August 2017.

Comments: ICCV 2017

arXiv:1708.01670 [pdf, other]

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Authors: Robert Maier, Kihwan Kim, Daniel Cremers, Jan Kautz, Matthias Nießner

Abstract: We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field (SDF), textures from automatically-selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on Shape-from… ▽ More We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field (SDF), textures from automatically-selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on Shape-from-Shading (SfS) techniques and utilizes the estimation of spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene. Through extensive examples and evaluations, we demonstrate that our method dramatically increases the level of detail in the reconstructed scene geometry and contributes highly to consistent surface texture recovery. △ Less

Submitted 4 August, 2017; originally announced August 2017.

arXiv:1708.00938 [pdf, other]

Associative Domain Adaptation

Authors: Philip Haeusser, Thomas Frerix, Alexander Mordvintsev, Daniel Cremers

Abstract: We propose associative domain adaptation, a novel technique for end-to-end domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically dom… ▽ More We propose associative domain adaptation, a novel technique for end-to-end domain adaptation with neural networks, the task of inferring class labels for an unlabeled target domain based on the statistical properties of a labeled source domain. Our training scheme follows the paradigm that in order to effectively derive class labels for the target domain, a network should produce statistically domain invariant embeddings, while minimizing the classification error on the labeled source domain. We accomplish this by reinforcing associations between source and target data directly in embedding space. Our method can easily be added to any existing classification network with no structural and almost no computational overhead. We demonstrate the effectiveness of our approach on various benchmarks and achieve state-of-the-art results across the board with a generic convolutional neural network architecture not specifically tuned to the respective tasks. Finally, we show that the proposed association loss produces embeddings that are more effective for domain adaptation compared to methods employing maximum mean discrepancy as a similarity measure in embedding space. △ Less

Submitted 2 August, 2017; originally announced August 2017.

Comments: In IEEE International Conference on Computer Vision (ICCV), 2017

arXiv:1708.00411 [pdf, other]

Depth Super-Resolution Meets Uncalibrated Photometric Stereo

Authors: Songyou Peng, Bjoern Haefner, Yvain Quéau, Daniel Cremers

Abstract: A novel depth super-resolution approach for RGB-D sensors is presented. It disambiguates depth super-resolution through high-resolution photometric clues and, symmetrically, it disambiguates uncalibrated photometric stereo through low-resolution depth cues. To this end, an RGB-D sequence is acquired from the same viewing angle, while illuminating the scene from various uncalibrated directions. Thi… ▽ More A novel depth super-resolution approach for RGB-D sensors is presented. It disambiguates depth super-resolution through high-resolution photometric clues and, symmetrically, it disambiguates uncalibrated photometric stereo through low-resolution depth cues. To this end, an RGB-D sequence is acquired from the same viewing angle, while illuminating the scene from various uncalibrated directions. This sequence is handled by a variational framework which fits high-resolution shape and reflectance, as well as lighting, to both the low-resolution depth measurements and the high-resolution RGB ones. The key novelty consists in a new PDE-based photometric stereo regularizer which implicitly ensures surface regularity. This allows to carry out depth super-resolution in a purely data-driven manner, without the need for any ad-hoc prior or material calibration. Real-world experiments are carried out using an out-of-the-box RGB-D sensor and a hand-held LED light source. △ Less

Submitted 24 August, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

Comments: International Conference on Computer Vision (ICCV) Workshop, 2017

arXiv:1707.08991 [pdf, other]

Efficient Deformable Shape Correspondence via Kernel Matching

Authors: Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg, Tal Remez, Emanuele Rodolà, Alex Bronstein, Michael Bronstein, Ron Kimmel, Daniel Cremers

Abstract: We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the map**, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of t… ▽ More We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the map**, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of the highly non-convex nature of the resulting quadratic assignment problem, our method converges to a semantically meaningful and continuous map** in most of our experiments, and scales well. We provide preliminary theoretical analysis and several interpretations of the method. △ Less

Submitted 15 September, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

Comments: Accepted for oral presentation at 3DV 2017, including supplementary material

arXiv:1707.01018 [pdf, other]

LED-based Photometric Stereo: Modeling, Calibration and Numerical Solution

Authors: Yvain Quéau, Bastien Durix, Tao Wu, Daniel Cremers, François Lauze, Jean-Denis Durou

Abstract: We conduct a thorough study of photometric stereo under nearby point light source illumination, from modeling to numerical solution, through calibration. In the classical formulation of photometric stereo, the luminous fluxes are assumed to be directional, which is very difficult to achieve in practice. Rather, we use light-emitting diodes (LEDs) to illuminate the scene to reconstruct. Such point… ▽ More We conduct a thorough study of photometric stereo under nearby point light source illumination, from modeling to numerical solution, through calibration. In the classical formulation of photometric stereo, the luminous fluxes are assumed to be directional, which is very difficult to achieve in practice. Rather, we use light-emitting diodes (LEDs) to illuminate the scene to reconstruct. Such point light sources are very convenient to use, yet they yield a more complex photometric stereo model which is arduous to solve. We first derive in a physically sound manner this model, and show how to calibrate its parameters. Then, we discuss two state-of-the-art numerical solutions. The first one alternatingly estimates the albedo and the normals, and then integrates the normals into a depth map. It is shown empirically to be independent from the initialization, but convergence of this sequential approach is not established. The second one directly recovers the depth, by formulating photometric stereo as a system of PDEs which are partially linearized using image ratios. Although the sequential approach is avoided, initialization matters a lot and convergence is not established either. Therefore, we introduce a provably convergent alternating reweighted least-squares scheme for solving the original system of PDEs, without resorting to image ratios for linearization. Finally, we extend this study to the case of RGB images. △ Less

Submitted 4 September, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

arXiv:1706.04638 [pdf, ps, other]

Proximal Backpropagation

Authors: Thomas Frerix, Thomas Möllenhoff, Michael Moeller, Daniel Cremers

Abstract: We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm… ▽ More We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size limitation of explicit gradient descent, which poses an impediment for optimization. ProxProp is developed from a general point of view on the backpropagation algorithm, currently the most common technique to train neural networks via stochastic gradient descent and variants thereof. Specifically, we show that backpropagation of a prediction error is equivalent to sequential gradient descent steps on a quadratic penalty energy, which comprises the network activations as variables of the optimization. We further analyze theoretical properties of ProxProp and in particular prove that the algorithm yields a descent direction in parameter space and can therefore be combined with a wide variety of convergent algorithms. Finally, we devise an efficient numerical implementation that integrates well with popular deep learning frameworks. We conclude by demonstrating promising numerical results and show that ProxProp can be effectively combined with common first order optimizers such as Adam. △ Less

Submitted 20 February, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

Comments: Published as a conference paper at ICLR 2018

arXiv:1706.00909 [pdf, other]

Learning by Association - A versatile semi-supervised training method for neural networks

Authors: Philip Häusser, Alexander Mordvintsev, Daniel Cremers

Abstract: In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. "Associations" are made from embeddings of labeled samples to th… ▽ More In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. "Associations" are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class from which the association was started and penalizes wrong associations ending at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. We demonstrate the capabilities of learning by association on several data sets and show that it can improve performance on classification tasks tremendously by making use of additionally available unlabeled data. In particular, for cases with few labeled data, our training scheme outperforms the current state of the art on SVHN. △ Less

Submitted 3 June, 2017; originally announced June 2017.

Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017

arXiv:1705.08314 [pdf, other]

Fusion of Head and Full-Body Detectors for Multi-Object Tracking

Authors: Roberto Henschel, Laura Leal-Taixé, Daniel Cremers, Bodo Rosenhahn

Abstract: In order to track all persons in a scene, the tracking-by-detection paradigm has proven to be a very effective approach. Yet, relying solely on a single detector is also a major limitation, as useful image information might be ignored. Consequently, this work demonstrates how to fuse two detectors into a tracking system. To obtain the trajectories, we propose to formulate tracking as a weighted gr… ▽ More In order to track all persons in a scene, the tracking-by-detection paradigm has proven to be a very effective approach. Yet, relying solely on a single detector is also a major limitation, as useful image information might be ignored. Consequently, this work demonstrates how to fuse two detectors into a tracking system. To obtain the trajectories, we propose to formulate tracking as a weighted graph labeling problem, resulting in a binary quadratic program. As such problems are NP-hard, the solution can only be approximated. Based on the Frank-Wolfe algorithm, we present a new solver that is crucial to handle such difficult problems. Evaluation on pedestrian tracking is provided for multiple scenarios, showing superior results over single detector tracking and standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and 1st on the new MOT17 benchmark, outperforming over 90 trackers. △ Less

Submitted 24 April, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

Comments: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 2018

arXiv:1705.05020 [pdf, other]

Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs

Authors: Emanuel Laude, Jan-Hendrik Lange, Jonas Schüpfer, Csaba Domokos, Laura Leal-Taixé, Frank R. Schmidt, Bjoern Andres, Daniel Cremers

Abstract: This paper introduces a novel algorithm for transductive inference in higher-order MRFs, where the unary energies are parameterized by a variable classifier. The considered task is posed as a joint optimization problem in the continuous classifier parameters and the discrete label variables. In contrast to prior approaches such as convex relaxations, we propose an advantageous decoupling of the ob… ▽ More This paper introduces a novel algorithm for transductive inference in higher-order MRFs, where the unary energies are parameterized by a variable classifier. The considered task is posed as a joint optimization problem in the continuous classifier parameters and the discrete label variables. In contrast to prior approaches such as convex relaxations, we propose an advantageous decoupling of the objective function into discrete and continuous subproblems and a novel, efficient optimization method related to ADMM. This approach preserves integrality of the discrete label variables and guarantees global convergence to a critical point. We demonstrate the advantages of our approach in several experiments including video object segmentation on the DAVIS data set and interactive image segmentation. △ Less

Submitted 28 April, 2018; v1 submitted 14 May, 2017; originally announced May 2017.

arXiv:1705.04300 [pdf, other]

Challenges in Monocular Visual Odometry: Photometric Calibration, Motion Bias and Rolling Shutter Effect

Authors: Nan Yang, Rui Wang, Xiang Gao, Daniel Cremers

Abstract: Monocular visual odometry (VO) and simultaneous localization and map** (SLAM) have seen tremendous improvements in accuracy, robustness and efficiency, and have gained increasing popularity over recent years. Nevertheless, not so many discussions have been carried out to reveal the influences of three very influential yet easily overlooked aspects: photometric calibration, motion bias and rollin… ▽ More Monocular visual odometry (VO) and simultaneous localization and map** (SLAM) have seen tremendous improvements in accuracy, robustness and efficiency, and have gained increasing popularity over recent years. Nevertheless, not so many discussions have been carried out to reveal the influences of three very influential yet easily overlooked aspects: photometric calibration, motion bias and rolling shutter effect. In this work, we evaluate these three aspects quantitatively on the state of the art of direct, feature-based and semi-direct methods, providing the community with useful practical knowledge both for better applying existing methods and develo** new algorithms of VO and SLAM. Conclusions (some of which are counter-intuitive) are drawn with both technical and empirical analyses to all of our experiments. Possible improvements on existing methods are directed or proposed, such as a sub-pixel accuracy refinement of ORB-SLAM which boosts its performance. △ Less

Submitted 7 June, 2018; v1 submitted 11 May, 2017; originally announced May 2017.

Comments: Accepted by IEEE Robotics and Automation Letters (RA-L), 2018. The first two authors contributed equally to this paper

arXiv:1704.04039 [pdf, other]

3D Deep Learning for Biological Function Prediction from Physical Fields

Authors: Vladimir Golkov, Marcin J. Skwark, Atanas Mirchev, Georgi Dikov, Alexander R. Geanes, Jeffrey Mendenhall, Jens Meiler, Daniel Cremers

Abstract: Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic pot… ▽ More Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the "raw fingerprint" of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on EC numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations. With future improvements of training datasets and neural network settings in combination with complementary information sources (sequence, genomic context, expression level), deep learning can be expected to show its generalization power and revolutionize the field of molecular function prediction. △ Less

Submitted 13 April, 2017; originally announced April 2017.

ACM Class: I.2.6; J.3

arXiv:1704.03488 [pdf, other]

doi 10.1109/ICCV.2017.198

Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems

Authors: Tim Meinhardt, Michael Moeller, Caner Hazirbas, Daniel Cremers

Abstract: While variational methods have been among the most powerful tools for solving linear inverse problems in imaging, deep (convolutional) neural networks have recently taken the lead in many challenging benchmarks. A remaining drawback of deep learning approaches is their requirement for an expensive retraining whenever the specific problem, the noise level, noise type, or desired measure of fidelity… ▽ More While variational methods have been among the most powerful tools for solving linear inverse problems in imaging, deep (convolutional) neural networks have recently taken the lead in many challenging benchmarks. A remaining drawback of deep learning approaches is their requirement for an expensive retraining whenever the specific problem, the noise level, noise type, or desired measure of fidelity changes. On the contrary, variational methods have a plug-and-play nature as they usually consist of separate data fidelity and regularization terms. In this paper we study the possibility of replacing the proximal operator of the regularization used in many convex energy minimization algorithms by a denoising neural network. The latter therefore serves as an implicit natural image prior, while the data term can still be chosen independently. Using a fixed denoising neural network in exemplary problems of image deconvolution with different blur kernels and image demosaicking, we obtain state-of-the-art reconstruction results. These indicate the high generalizability of our approach and a reduction of the need for problem-specific training. Additionally, we discuss novel results on the analysis of possible optimization algorithms to incorporate the network into, as well as the choices of algorithm parameters and their relation to the noise level the neural network is trained on. △ Less

Submitted 30 August, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

arXiv:1704.02781 [pdf, other]

Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking

Authors: Laura Leal-Taixé, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth

Abstract: Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. We present a benchmark for Multiple Object Tracking launched in the late 2014, with the goal of creating a framework for the stan… ▽ More Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. We present a benchmark for Multiple Object Tracking launched in the late 2014, with the goal of creating a framework for the standardized evaluation of multiple object tracking methods. This paper collects the two releases of the benchmark made so far, and provides an in-depth analysis of almost 50 state-of-the-art trackers that were tested on over 11000 frames. We show the current trends and weaknesses of multiple people tracking methods, and provide pointers of what researchers should be focusing on to push the field forward. △ Less

Submitted 10 April, 2017; originally announced April 2017.

arXiv:1704.01085 [pdf, other]

Deep Depth From Focus

Authors: Caner Hazirbas, Sebastian Georg Soyer, Maximilian Christian Staab, Laura Leal-Taixé, Daniel Cremers

Abstract: Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end learning ap… ▽ More Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end learning approach to this problem. One of the main challenges we face is the hunger for data of deep neural networks. In order to obtain a significant amount of focal stacks with corresponding groundtruth depth, we propose to leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us to digitally create focal stacks of varying sizes. Compared to existing benchmarks our dataset is 25 times larger, enabling the use of machine learning for this inverse problem. We compare our results with state-of-the-art DFF methods and we also analyze the effect of several key deep architectural components. These experiments show that our proposed method `DDFFNet' achieves state-of-the-art performance in all scenes, reducing depth error by more than 75% compared to the classical DFF methods. △ Less

Submitted 28 October, 2018; v1 submitted 4 April, 2017; originally announced April 2017.

Comments: accepted to Asian Conference on Computer Vision (ACCV) 2018

arXiv:1704.00337 [pdf, other]

Dense Multi-view 3D-reconstruction Without Dense Correspondences

Authors: Yvain Quéau, Jean Mélou, Jean-Denis Durou, Daniel Cremers

Abstract: We introduce a variational method for multi-view shape-from-shading under natural illumination. The key idea is to couple PDE-based solutions for single-image based shape-from-shading problems across multiple images and multiple color channels by means of a variational formulation. Rather than alternatingly solving the individual SFS problems and optimizing the consistency across images and channe… ▽ More We introduce a variational method for multi-view shape-from-shading under natural illumination. The key idea is to couple PDE-based solutions for single-image based shape-from-shading problems across multiple images and multiple color channels by means of a variational formulation. Rather than alternatingly solving the individual SFS problems and optimizing the consistency across images and channels which is known to lead to suboptimal results, we propose an efficient solution of the coupled problem by means of an ADMM algorithm. In numerous experiments on both simulated and real imagery, we demonstrate that the proposed fusion of multiple-view reconstruction and shape-from-shading provides highly accurate dense reconstructions without the need to compute dense correspondences. With the proposed variational integration across multiple views shape-from-shading techniques become applicable to challenging real-world reconstruction problems, giving rise to highly detailed geometry even in areas of smooth brightness variation and lacking texture. △ Less

Submitted 2 April, 2017; originally announced April 2017.

arXiv:1703.08866 [pdf, other]

Multi-View Deep Learning for Consistent Semantic Map** with RGB-D Cameras

Authors: Lingni Ma, Jörg Stückler, Christian Kerl, Daniel Cremers

Abstract: Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics pr… ▽ More Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics predictions of our network can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion. △ Less

Submitted 4 December, 2017; v1 submitted 26 March, 2017; originally announced March 2017.

Comments: the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017)

arXiv:1703.08001 [pdf, other]

Nonlinear Spectral Image Fusion

Authors: Martin Benning, Michael Möller, Raz Z. Nossek, Martin Burger, Daniel Cremers, Guy Gilboa, Carola-Bibiane Schönlieb

Abstract: In this paper we demonstrate that the framework of nonlinear spectral decompositions based on total variation (TV) regularization is very well suited for image fusion as well as more general image manipulation tasks. The well-localized and edge-preserving spectral TV decomposition allows to select frequencies of a certain image to transfer particular features, such as wrinkles in a face, from one… ▽ More In this paper we demonstrate that the framework of nonlinear spectral decompositions based on total variation (TV) regularization is very well suited for image fusion as well as more general image manipulation tasks. The well-localized and edge-preserving spectral TV decomposition allows to select frequencies of a certain image to transfer particular features, such as wrinkles in a face, from one image to another. We illustrate the effectiveness of the proposed approach in several numerical experiments, including a comparison to the competing techniques of Poisson image editing, linear osmosis, wavelet fusion and Laplacian pyramid fusion. We conclude that the proposed spectral TV image decomposition framework is a valuable tool for semi- and fully-automatic image editing and fusion. △ Less

Submitted 23 March, 2017; originally announced March 2017.

Comments: 13 pages, 9 figures, submitted to SSVM conference proceedings 2017

MSC Class: 35P30; 62H35; 65M70; 94A08 ACM Class: G.1.3; G.1.6; G.1.8; I.4.0; I.4.5

arXiv:1703.01416 [pdf, other]

doi 10.1109/IROS.2017.8202160

Real-Time Trajectory Replanning for MAVs using Uniform B-splines and a 3D Circular Buffer

Authors: Vladyslav Usenko, Lukas von Stumberg, Andrej Pangercic, Daniel Cremers

Abstract: In this paper, we present a real-time approach to local trajectory replanning for microaerial vehicles (MAVs). Current trajectory generation methods for multicopters achieve high success rates in cluttered environments, but assume that the environment is static and require prior knowledge of the map. In the presented study, we use the results of such planners and extend them with a local replannin… ▽ More In this paper, we present a real-time approach to local trajectory replanning for microaerial vehicles (MAVs). Current trajectory generation methods for multicopters achieve high success rates in cluttered environments, but assume that the environment is static and require prior knowledge of the map. In the presented study, we use the results of such planners and extend them with a local replanning algorithm that can handle unmodeled (possibly dynamic) obstacles while kee** the MAV close to the global trajectory. To ensure that the proposed approach is real-time capable, we maintain information about the environment around the MAV in an occupancy grid stored in a three-dimensional circular buffer, which moves together with a drone, and represent the trajectories by using uniform B-splines. This representation ensures that the trajectory is sufficiently smooth and simultaneously allows for efficient optimization. △ Less

Submitted 23 July, 2017; v1 submitted 4 March, 2017; originally announced March 2017.

arXiv:1701.00669 [pdf, other]

Product Manifold Filter: Non-Rigid Shape Correspondence via Kernel Density Estimation in the Product Space

Authors: Matthias Vestner, Roee Litman, Emanuele Rodolà, Alex Bronstein, Daniel Cremers

Abstract: Many algorithms for the computation of correspondences between deformable shapes rely on some variant of nearest neighbor matching in a descriptor space. Such are, for example, various point-wise correspondence recovery algorithms used as a post-processing stage in the functional correspondence framework. Such frequently used techniques implicitly make restrictive assumptions (e.g., near-isometry)… ▽ More Many algorithms for the computation of correspondences between deformable shapes rely on some variant of nearest neighbor matching in a descriptor space. Such are, for example, various point-wise correspondence recovery algorithms used as a post-processing stage in the functional correspondence framework. Such frequently used techniques implicitly make restrictive assumptions (e.g., near-isometry) on the considered shapes and in practice suffer from lack of accuracy and result in poor surjectivity. We propose an alternative recovery technique capable of guaranteeing a bijective correspondence and producing significantly higher accuracy and smoothness. Unlike other methods our approach does not depend on the assumption that the analyzed shapes are isometric. We derive the proposed method from the statistical framework of kernel density estimation and demonstrate its performance on several challenging deformable 3D shape matching datasets. △ Less

Submitted 7 April, 2017; v1 submitted 3 January, 2017; originally announced January 2017.

Comments: To appear at CVPR 2017

arXiv:1612.03653 [pdf, other]

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Authors: Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that,… ▽ More We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour. △ Less

Submitted 21 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: NIPS workshop on Deep Learning for Action and Interaction, 2016

Showing 151–200 of 226 results for author: Cremers, D