Search | arXiv e-print repository

Variational Learning is Effective for Large Deep Networks

Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. △ Less

Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

arXiv:2307.07753 [pdf, other]

Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks

Authors: Dominik Schnaus, Jongseok Lee, Daniel Cremers, Rudolph Triebel

Abstract: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained mod… ▽ More In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization. △ Less

Submitted 15 July, 2023; originally announced July 2023.

Comments: Accepted to ICML 2023

arXiv:2212.02988 [pdf, other]

PRISM: Probabilistic Real-Time Inference in Spatial World Models

Authors: Atanas Mirchev, Baris Kayalibay, Ahmed Agha, Patrick van der Smagt, Daniel Cremers, Justin Bayer

Abstract: We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model… ▽ More We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and map** (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD). △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Will appear in PMLR, CoRL 2022

arXiv:2210.15575 [pdf, other]

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers

Abstract: Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration e… ▽ More Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Presented at NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022)

arXiv:2110.00053 [pdf, other]

Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation

Authors: Florian Bernard, Daniel Cremers, Johan Thunberg

Abstract: We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchro… ▽ More We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees. We fill this gap and propose a simple yet effective sparsity-promoting modification of the Orthogonal Iteration algorithm for finding the dominant eigenspace of a matrix. By doing so, we can guarantee that our method finds a Stiefel matrix that is globally optimal with respect to the quadratic objective function, while in addition being sparse. As a motivating application we consider the task of permutation synchronisation, which can be understood as a constrained clustering problem that has particular relevance for matching multiple images or 3D shapes in computer vision, computer graphics, and beyond. We demonstrate that the proposed approach outperforms previous methods in this domain. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: To appear at NeurIPS 2021

arXiv:2107.13059 [pdf, other]

Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised Node Classification

Authors: Yu Wang, Yuesong Shen, Daniel Cremers

Abstract: Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations,… ▽ More Node features and structural information of a graph are both crucial for semi-supervised node classification problems. A variety of graph neural network (GNN) based approaches have been proposed to tackle these problems, which typically determine output labels through feature aggregation. This can be problematic, as it implies conditional independence of output nodes given hidden representations, despite their direct connections in the graph. To learn the direct influence among output nodes in a graph, we propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field. It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations. To balance model complexity and expressivity, the pairwise factors have a shared component and a separate scaling coefficient for each edge. We apply the EM algorithm to train our model, and utilize a star-shaped piecewise likelihood for the tractable surrogate objective. We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs. △ Less

Submitted 27 July, 2021; originally announced July 2021.

arXiv:2012.10988 [pdf, other]

Post-hoc Uncertainty Calibration for Domain Drift Scenarios

Authors: Christian Tomani, Sebastian Gruber, Muhammed Ebrar Erdem, Daniel Cremers, Florian Buettner

Abstract: We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show… ▽ More We address the problem of uncertainty calibration. While standard deep neural networks typically yield uncalibrated predictions, calibrated confidence scores that are representative of the true likelihood of a prediction can be achieved using post-hoc calibration methods. However, to date the focus of these approaches has been on in-domain calibration. Our contribution is two-fold. First, we show that existing post-hoc calibration methods yield highly over-confident predictions under domain shift. Second, we introduce a simple strategy where perturbations are applied to samples in the validation set before performing the post-hoc calibration step. In extensive experiments, we demonstrate that this perturbation step results in substantially better calibration under domain shift on a wide range of architectures and modelling tasks. △ Less

Submitted 23 June, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Code available at https://github.com/tochris/calibration-domain-drift

arXiv:2007.07029 [pdf, ps, other]

Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions

Authors: Vladimir Golkov, Alexander Becker, Daniel T. Plop, Daniel Čuturilo, Neda Davoudi, Jeffrey Mendenhall, Rocco Moretti, Jens Meiler, Daniel Cremers

Abstract: Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of g… ▽ More Computer-aided drug discovery is an essential component of modern drug development. Therein, deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features. Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets. In this work we argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance, its ability to compromise over different decision thresholds, certain freedom to influence the relative weights in this compromise, fidelity to typical benchmarking measures, and equivalence to positive/unlabeled learning. We also propose new training schemes (coherent mini-batch arrangement, and usage of out-of-batch samples) for cost functions based on the ROC, as well as a cost function based on the logAUC metric that facilitates early enrichment (i.e. improves performance at high decision thresholds, as often desired when synthesizing predicted hit compounds). We demonstrate that these approaches outperform standard deep learning approaches on a series of PubChem high-throughput screening datasets that represent realistic and diverse drug discovery campaigns on major drug target families. △ Less

Submitted 25 June, 2020; originally announced July 2020.

Comments: 10 pages

MSC Class: 68T07 (Primary) 62H30; 92E99; 68T10; 62F07 (Secondary) ACM Class: G.3; I.2.1; I.2.6; I.5.1; J.3

arXiv:2006.16856 [pdf, other]

A Chain Graph Interpretation of Real-World Neural Networks

Authors: Yuesong Shen, Daniel Cremers

Abstract: The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains. However, most advances have been established empirically, and their theoretical analysis remains lacking. One major issue is that our current interpretation of neural networks (NNs) as function approximators is too generic to support in-depth analysis. In this pape… ▽ More The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains. However, most advances have been established empirically, and their theoretical analysis remains lacking. One major issue is that our current interpretation of neural networks (NNs) as function approximators is too generic to support in-depth analysis. In this paper, we remedy this by proposing an alternative interpretation that identifies NNs as chain graphs (CGs) and feed-forward as an approximate inference procedure. The CG interpretation specifies the nature of each NN component within the rich theoretical framework of probabilistic graphical models, while at the same time remains general enough to cover real-world NNs with arbitrary depth, multi-branching and varied activations, as well as common structures including convolution / recurrent layers, residual block and dropout. We demonstrate with concrete examples that the CG interpretation can provide novel theoretical support and insights for various NN techniques, as well as derive new deep learning approaches such as the concept of partially collapsed feed-forward inference. It is thus a promising framework that deepens our understanding of neural networks and provides a coherent theoretical formulation for future deep learning research. △ Less

Submitted 6 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

arXiv:2006.12456 [pdf, other]

Effective Version Space Reduction for Convolutional Neural Networks

Authors: Jiayu Liu, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers

Abstract: In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two… ▽ More In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two approaches---prior mass reduction and diameter reduction---and propose a new diameter-based querying method---the minimum Gibbs-vote disagreement. By estimating version space diameter and bias, we illustrate how version space of neural networks evolves and examine the realizability assumption. With experiments on MNIST, Fashion-MNIST, SVHN and STL-10 datasets, we demonstrate that diameter reduction methods reduce the version space more effectively and perform better than prior mass reduction and other baselines, and that the Gibbs vote disagreement is on par with the best query method. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 22 pages, 8 figures, to be published in the Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2020

ACM Class: I.2.6; G.3; I.5.1

arXiv:1912.02160 [pdf, other]

Informative GANs via Structured Regularization of Optimal Transport

Authors: Pierre Bréchet, Tao Wu, Thomas Möllenhoff, Daniel Cremers

Abstract: We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT). Specifically, a smoothed OT loss gives rise to an implicit transportation plan between the latent space and the data space. Based on this theoretical observation, we exploit a structured regularization on the transportation plan to en… ▽ More We tackle the challenge of disentangled representation learning in generative adversarial networks (GANs) from the perspective of regularized optimal transport (OT). Specifically, a smoothed OT loss gives rise to an implicit transportation plan between the latent space and the data space. Based on this theoretical observation, we exploit a structured regularization on the transportation plan to encourage a prescribed latent subspace to be informative. This yields the formulation of a novel informative OT-based GAN. By convex duality, we obtain the equivalent view that this leads to perturbed ground costs favoring sparsity in the informative latent dimensions. Practically, we devise a stable training algorithm for the proposed informative GAN. Our experiments support the hypothesis that such regularizations effectively yield the discovery of disentangled and interpretable latent representations. Our work showcases potential power of a regularized OT framework in the context of generative modeling through its access to the transport plan. Further challenges are addressed in this line. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: Presented at the Optimal Transport and Machine Learning Workshop, NeurIPS 2019

arXiv:1910.14594 [pdf, other]

Deep Learning for 2D and 3D Rotatable Data: An Overview of Methods

Authors: Luca Della Libera, Vladimir Golkov, Yue Zhu, Arman Mielke, Daniel Cremers

Abstract: Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/p… ▽ More Convolutional networks are successful due to their equivariance/invariance under translations. However, rotatable data such as images, volumes, shapes, or point clouds require processing with equivariance/invariance under rotations in cases where the rotational orientation of the coordinate system does not affect the meaning of the data (e.g. object classification). On the other hand, estimation/processing of rotations is necessary in cases where rotations are important (e.g. motion estimation). There has been recent progress in methods and theory in all these regards. Here we provide an overview of existing methods, both for 2D and 3D rotations (and translations), and identify commonalities and links between them. △ Less

Submitted 22 November, 2021; v1 submitted 31 October, 2019; originally announced October 2019.

Comments: Improved Definition 1, improved and merged Sections 3.3-3.4, minor additional changes

MSC Class: 62M45; 68T45; 62H35; 65D18; 68U10 ACM Class: I.2.6; I.5.1; G.3

arXiv:1907.11025 [pdf, other]

Towards Generalizing Sensorimotor Control Across Weather Conditions

Authors: Qadeer Khan, Patrick Wenzel, Daniel Cremers, Laura Leal-Taixé

Abstract: The ability of deep learning models to generalize well across different scenarios depends primarily on the quality and quantity of annotated data. Labeling large amounts of data for all possible scenarios that a model may encounter would not be feasible; if even possible. We propose a framework to deal with limited labeled training data and demonstrate it on the application of vision-based vehicle… ▽ More The ability of deep learning models to generalize well across different scenarios depends primarily on the quality and quantity of annotated data. Labeling large amounts of data for all possible scenarios that a model may encounter would not be feasible; if even possible. We propose a framework to deal with limited labeled training data and demonstrate it on the application of vision-based vehicle control. We show how limited steering angle data available for only one condition can be transferred to multiple different weather scenarios. This is done by leveraging unlabeled images in a teacher-student learning paradigm complemented with an image-to-image translation network. The translation network transfers the images to a new domain, whereas the teacher provides soft supervised targets to train the student on this domain. Furthermore, we demonstrate how utilization of auxiliary networks can reduce the size of a model at inference time, without affecting the accuracy. The experiments show that our approach generalizes well across multiple different weather conditions using only ground truth labels from one domain. △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: Accepted for publication in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1905.04730 [pdf, other]

Flat Metric Minimization with Applications in Generative Modeling

Authors: Thomas Möllenhoff, Daniel Cremers

Abstract: We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-current… ▽ More We take the novel perspective to view data not as a probability distribution but rather as a current. Primarily studied in the field of geometric measure theory, $k$-currents are continuous linear functionals acting on compactly supported smooth differential forms and can be understood as a generalized notion of oriented $k$-dimensional manifold. By moving from distributions (which are $0$-currents) to $k$-currents, we can explicitly orient the data by attaching a $k$-dimensional tangent plane to each sample point. Based on the flat metric which is a fundamental distance between currents, we derive FlatGAN, a formulation in the spirit of generative adversarial networks but generalized to $k$-currents. In our theoretical contribution we prove that the flat metric between a parametrized current and a reference current is Lipschitz continuous in the parameters. In experiments, we show that the proposed shift to $k>0$ leads to interpretable and disentangled latent representations which behave equivariantly to the specified oriented tangent planes. △ Less

Submitted 12 May, 2019; originally announced May 2019.

arXiv:1905.03389 [pdf, other]

Learning to Evolve

Authors: Jan Schuchardt, Vladimir Golkov, Daniel Cremers

Abstract: Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombin… ▽ More Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombine better than at random, improves the result of evolution in terms of fitness increase per generation and even in terms of attainable fitness. We use deep reinforcement learning to learn to dynamically adjust the strategy of evolutionary algorithms to varying circumstances. Our methods outperform classical evolutionary algorithms on combinatorial and continuous optimization problems. △ Less

Submitted 8 May, 2019; originally announced May 2019.

MSC Class: 62M45; 68T05; 68W25; 68T20; 90C40; 91A22; 92D15; 92D25 ACM Class: G.1.6; I.2.6; I.2.8; G.3; I.5.1

arXiv:1904.03081 [pdf, other]

Controlling Neural Networks via Energy Dissipation

Authors: Michael Moeller, Thomas Möllenhoff, Daniel Cremers

Abstract: The last decade has shown a tremendous success in solving various computer vision problems with the help of deep learning techniques. Lately, many works have demonstrated that learning-based approaches with suitable network architectures even exhibit superior performance for the solution of (ill-posed) image reconstruction problems such as deblurring, super-resolution, or medical image reconstruct… ▽ More The last decade has shown a tremendous success in solving various computer vision problems with the help of deep learning techniques. Lately, many works have demonstrated that learning-based approaches with suitable network architectures even exhibit superior performance for the solution of (ill-posed) image reconstruction problems such as deblurring, super-resolution, or medical image reconstruction. The drawback of purely learning-based methods, however, is that they cannot provide provable guarantees for the trained network to follow a given data formation process during inference. In this work we propose energy dissipating networks that iteratively compute a descent direction with respect to a given cost function or energy at the currently estimated reconstruction. Therefore, an adaptive step size rule such as a line-search, along with a suitable number of iterations can guarantee the reconstruction to follow a given data formation model encoded in the energy to arbitrary precision, and hence control the model's behavior even during test time. We prove that under standard assumptions, descent using the direction predicted by the network converges (linearly) to the global minimum of the energy. We illustrate the effectiveness of the proposed approach in experiments on single image super resolution and computed tomography (CT) reconstruction, and further illustrate extensions to convex feasibility problems. △ Less

Submitted 20 August, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: Published as a conference paper at ICCV 2019, Seoul

arXiv:1902.01785 [pdf, other]

Homogeneous Linear Inequality Constraints for Neural Network Activations

Authors: Thomas Frerix, Matthias Nießner, Daniel Cremers

Abstract: We propose a method to impose homogeneous linear inequality constraints of the form $Ax\leq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One way to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By direc… ▽ More We propose a method to impose homogeneous linear inequality constraints of the form $Ax\leq 0$ on neural network activations. The proposed method allows a data-driven training approach to be combined with modeling prior knowledge about the task. One way to achieve this task is by means of a projection step at test time after unconstrained training. However, this is an expensive operation. By directly incorporating the constraints into the architecture, we can significantly speed-up inference at test time; for instance, our experiments show a speed-up of up to two orders of magnitude over a projection method. Our algorithm computes a suitable parameterization of the feasible set at initialization and uses standard variants of stochastic gradient descent to find solutions to the constrained network. Thus, the modeling constraints are always satisfied during training. Crucially, our approach avoids to solve an optimization problem at each training step or to manually trade-off data and constraint fidelity with additional hyperparameters. We consider constrained generative modeling as an important application domain and experimentally demonstrate the proposed method by constraining a variational autoencoder. △ Less

Submitted 28 May, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: CVPR 2020 DeepVision Workshop

arXiv:1902.00057 [pdf, other]

Probabilistic Discriminative Learning with Layered Graphical Models

Authors: Yuesong Shen, Tao Wu, Csaba Domokos, Daniel Cremers

Abstract: Probabilistic graphical models are traditionally known for their successes in generative modeling. In this work, we advocate layered graphical models (LGMs) for probabilistic discriminative learning. To this end, we design LGMs in close analogy to neural networks (NNs), that is, they have deep hierarchical structures and convolutional or local connections between layers. Equipped with tensorized t… ▽ More Probabilistic graphical models are traditionally known for their successes in generative modeling. In this work, we advocate layered graphical models (LGMs) for probabilistic discriminative learning. To this end, we design LGMs in close analogy to neural networks (NNs), that is, they have deep hierarchical structures and convolutional or local connections between layers. Equipped with tensorized truncated variational inference, our LGMs can be efficiently trained via backpropagation on mainstream deep learning frameworks such as PyTorch. To deal with continuous valued inputs, we use a simple yet effective soft-clam** strategy for efficient inference. Through extensive experiments on image classification over MNIST and FashionMNIST datasets, we demonstrate that LGMs are capable of achieving competitive results comparable to NNs of similar architectures, while preserving transparent probabilistic modeling. △ Less

Submitted 31 January, 2019; originally announced February 2019.

arXiv:1807.01001 [pdf, other]

Modular Vehicle Control for Transferring Semantic Information Between Weather Conditions Using GANs

Authors: Patrick Wenzel, Qadeer Khan, Daniel Cremers, Laura Leal-Taixé

Abstract: Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground tr… ▽ More Even though end-to-end supervised learning has shown promising results for sensorimotor control of self-driving cars, its performance is greatly affected by the weather conditions under which it was trained, showing poor generalization to unseen conditions. In this paper, we show how knowledge can be transferred using semantic maps to new weather conditions without the need to obtain new ground truth data. To this end, we propose to divide the task of vehicle control into two independent modules: a control module which is only trained on one weather condition for which labeled steering data is available, and a perception module which is used as an interface between new weather conditions and the fixed control module. To generate the semantic data needed to train the perception module, we propose to use a generative adversarial network (GAN)-based model to retrieve the semantic information for the new conditions in an unsupervised manner. We introduce a master-servant architecture, where the master model (semantic labels available) trains the servant model (semantic labels not available). We show that our proposed method trained with ground truth data for a single weather condition is capable of achieving similar results on the task of steering angle prediction as an end-to-end model trained with ground truth data of 15 different weather conditions. △ Less

Submitted 1 October, 2018; v1 submitted 3 July, 2018; originally announced July 2018.

Comments: 2nd Conference on Robot Learning (CoRL 2018), Zürich, Switzerland

arXiv:1806.02997 [pdf, other]

q-Space Novelty Detection with Variational Autoencoders

Authors: Aleksei Vasilev, Vladimir Golkov, Marc Meissner, Ilona Lipp, Eleonora Sgarlata, Valentina Tomassini, Derek K. Jones, Daniel Cremers

Abstract: In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we… ▽ More In machine learning, novelty detection is the task of identifying novel unseen data. During training, only samples from the normal class are available. Test samples are classified as normal or abnormal by assignment of a novelty score. Here we propose novelty detection methods based on training variational autoencoders (VAEs) on normal data. Since abnormal samples are not used during training, we define novelty metrics based on the (partially complementary) assumptions that the VAE is less capable of reconstructing abnormal samples well; that abnormal samples more strongly violate the VAE regularizer; and that abnormal samples differ from normal samples not only in input-feature space, but also in the VAE latent space and VAE output. These approaches, combined with various possibilities of using (e.g. sampling) the probabilistic VAE to obtain scalar novelty scores, yield a large family of methods. We apply these methods to magnetic resonance imaging, namely to the detection of diffusion-space (q-space) abnormalities in diffusion MRI scans of multiple sclerosis patients, i.e. to detect multiple sclerosis lesions without using any lesion labels for training. Many of our methods outperform previously proposed q-space novelty detection methods. We also evaluate the proposed methods on the MNIST handwritten digits dataset and show that many of them are able to outperform the state of the art. △ Less

Submitted 25 October, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: 11 pages, 2 figures

MSC Class: 62F15; 62G07; 62M45; 68T30 ACM Class: G.3; H.3.3; I.2.4; I.2.6; I.4.6; I.5; I.5.4; J.3

arXiv:1801.07648 [pdf, other]

Clustering with Deep Learning: Taxonomy and New Methods

Authors: Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Maximilian Strobel, Daniel Cremers

Abstract: Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxon… ▽ More Clustering methods based on deep neural networks have proven promising for clustering real-world data because of their high representational power. In this paper, we propose a systematic taxonomy of clustering methods that utilize deep neural networks. We base our taxonomy on a comprehensive review of recent work and validate the taxonomy in a case study. In this case study, we show that the taxonomy enables researchers and practitioners to systematically create new clustering methods by selectively recombining and replacing distinct aspects of previous methods with the goal of overcoming their individual limitations. The experimental evaluation confirms this and shows that the method created for the case study achieves state-of-the-art clustering quality and surpasses it in some cases. △ Less

Submitted 13 September, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

MSC Class: 62H30; 62M45; 91C20 ACM Class: H.3.3; I.2.6; I.5; I.5.3; I.5.4

arXiv:1801.06397 [pdf, other]

doi 10.1007/s11263-018-1082-6

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Authors: Nikolaus Mayer, Eddy Ilg, Philipp Fischer, Caner Hazirbas, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method… ▽ More The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks.We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process. △ Less

Submitted 22 March, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

Comments: added references (UCL dataset); added IJCV copyright information

arXiv:1801.05413 [pdf, other]

Combinatorial Preconditioners for Proximal Algorithms on Graphs

Authors: Thomas Möllenhoff, Zhenzhang Ye, Tao Wu, Daniel Cremers

Abstract: We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners. Such combinatorial preconditioners arise from partitioning the graph into forests. We prove that certain decompositions lead to a theoretically optimal condition number. We also show how ideal decompositions can be realized using matroid partitionin… ▽ More We present a novel preconditioning technique for proximal optimization methods that relies on graph algorithms to construct effective preconditioners. Such combinatorial preconditioners arise from partitioning the graph into forests. We prove that certain decompositions lead to a theoretically optimal condition number. We also show how ideal decompositions can be realized using matroid partitioning and propose efficient greedy variants thereof for large-scale problems. Coupled with specialized solvers for the resulting scaled proximal subproblems, the preconditioned algorithm achieves competitive performance in machine learning and vision applications. △ Less

Submitted 21 February, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: Published as a conference paper at AISTATS 2018

arXiv:1710.10686 [pdf, ps, other]

Regularization for Deep Learning: A Taxonomy

Authors: Jan Kukačka, Vladimir Golkov, Daniel Cremers

Abstract: Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization proc… ▽ More Regularization is one of the crucial ingredients of deep learning, yet the term regularization has various definitions, and regularization methods are often studied separately from each other. In our work we present a systematic, unifying taxonomy to categorize existing methods. We distinguish methods that affect data, network architectures, error terms, regularization terms, and optimization procedures. We do not provide all details about the listed methods; instead, we present an overview of how the methods can be sorted into meaningful categories and sub-categories. This helps revealing links and fundamental similarities between them. Finally, we include practical recommendations both for users and for developers of new regularization methods. △ Less

Submitted 29 October, 2017; originally announced October 2017.

MSC Class: 62M45 ACM Class: I.2.6; I.5

arXiv:1704.04039 [pdf, other]

3D Deep Learning for Biological Function Prediction from Physical Fields

Authors: Vladimir Golkov, Marcin J. Skwark, Atanas Mirchev, Georgi Dikov, Alexander R. Geanes, Jeffrey Mendenhall, Jens Meiler, Daniel Cremers

Abstract: Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic pot… ▽ More Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the "raw fingerprint" of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on EC numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations. With future improvements of training datasets and neural network settings in combination with complementary information sources (sequence, genomic context, expression level), deep learning can be expected to show its generalization power and revolutionize the field of molecular function prediction. △ Less

Submitted 13 April, 2017; originally announced April 2017.

ACM Class: I.2.6; J.3

arXiv:1512.02134 [pdf, other]

doi 10.1109/CVPR.2016.438

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Authors: Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, Thomas Brox

Abstract: Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we pro… ▽ More Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluating scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network. △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: Includes supplementary material

ACM Class: I.2.6; I.2.10; I.4.8

Showing 1–26 of 26 results for author: Cremers, D