Skip to main content

Showing 1–22 of 22 results for author: Immer, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15978  [pdf, other

    cs.LG stat.ML

    Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood

    Authors: Rayen Dhahri, Alexander Immer, Betrand Charpentier, Stephan Günnemann, Vincent Fortuin

    Abstract: Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to naïvely deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlook… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  2. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  3. arXiv:2312.00232  [pdf, other

    cs.LG cs.AI stat.ML

    Uncertainty in Graph Contrastive Learning with Bayesian Neural Networks

    Authors: Alexander Möllers, Alexander Immer, Elvin Isufi, Vincent Fortuin

    Abstract: Graph contrastive learning has shown great promise when labeled data is scarce, but large unlabeled datasets are available. However, it often does not take uncertainty estimation into account. We show that a variational Bayesian neural network approach can be used to improve not only the uncertainty estimates but also the downstream performance on semi-supervised node-classification tasks. Moreove… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  4. arXiv:2311.00636  [pdf, other

    cs.LG stat.ML

    Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures

    Authors: Runa Eschenhagen, Alexander Immer, Richard E. Turner, Frank Schneider, Philipp Hennig

    Abstract: The core components of many modern neural network architectures, such as transformers, convolutional, or graph neural networks, can be expressed as linear layers with $\textit{weight-sharing}$. Kronecker-Factored Approximate Curvature (K-FAC), a second-order optimisation method, has shown promise to speed up neural network training and thereby reduce computational costs. However, there is currentl… ▽ More

    Submitted 11 January, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  5. arXiv:2310.06131  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Layer-wise Equivariances Automatically using Gradients

    Authors: Tycho F. A. van der Ouderaa, Alexander Immer, Mark van der Wilk

    Abstract: Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and assoc… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  6. arXiv:2310.02012  [pdf, other

    cs.LG cs.AI

    Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

    Authors: Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

    Abstract: Normalization layers are one of the key building blocks for deep neural networks. Several theoretical studies have shown that batch normalization improves the signal propagation, by avoiding the representations from becoming collinear across the layers. However, results on mean-field theory of batch normalization also conclude that this benefit comes at the expense of exploding gradients in depth.… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  7. arXiv:2309.07364  [pdf, other

    cs.LG cs.AI eess.SP

    Hodge-Aware Contrastive Learning

    Authors: Alexander Möllers, Alexander Immer, Vincent Fortuin, Elvin Isufi

    Abstract: Simplicial complexes prove effective in modeling data with multiway dependencies, such as data defined along the edges of networks or within other higher-order structures. Their spectrum can be decomposed into three interpretable subspaces via the Hodge decomposition, resulting foundational in numerous applications. We leverage this decomposition to develop a contrastive self-supervised learning a… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 4 pages, 2 figures

  8. arXiv:2306.03968  [pdf, other

    stat.ML cs.LG

    Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

    Authors: Alexander Immer, Tycho F. A. van der Ouderaa, Mark van der Wilk, Gunnar Rätsch, Bernhard Schölkopf

    Abstract: Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass throug… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  9. arXiv:2305.16905  [pdf, other

    stat.ML cs.LG

    Improving Neural Additive Models with Bayesian Principles

    Authors: Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Rätsch, Vincent Fortuin

    Abstract: Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible inter… ▽ More

    Submitted 29 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 41st International Conference on Machine Learning (ICML 2024)

  10. arXiv:2304.08309  [pdf, other

    cs.LG stat.ML

    Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization

    Authors: Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Vincent Fortuin

    Abstract: The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while i… ▽ More

    Submitted 10 July, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: AABI 2023

  11. arXiv:2210.09054  [pdf, other

    stat.ML cs.AI cs.LG

    On the Identifiability and Estimation of Causal Location-Scale Noise Models

    Authors: Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Schölkopf, Peter Bühlmann, Alexander Marx

    Abstract: We study the class of location-scale or heteroscedastic noise models (LSNMs), in which the effect $Y$ can be written as a function of the cause $X$ and a noise source $N$ independent of $X$, which may be scaled by a positive function $g$ over the cause, i.e., $Y = f(X) + g(X)N$. Despite the generality of the model class, we show the causal direction is identifiable up to some pathological cases. T… ▽ More

    Submitted 1 June, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: ICML 2023

  12. arXiv:2202.10638  [pdf, other

    stat.ML cs.LG

    Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

    Authors: Alexander Immer, Tycho F. A. van der Ouderaa, Gunnar Rätsch, Vincent Fortuin, Mark van der Wilk

    Abstract: Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the data augmentation parameters are chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentatio… ▽ More

    Submitted 13 October, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  13. arXiv:2110.08388  [pdf, other

    cs.CL

    Probing as Quantifying Inductive Bias

    Authors: Alexander Immer, Lucas Torroba Hennigen, Vincent Fortuin, Ryan Cotterell

    Abstract: Pre-trained contextual representations have led to dramatic performance improvements on a range of downstream tasks. Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations. In general, researchers quantify the amount of linguistic information through probing, an endeavor which consists of training a supervised… ▽ More

    Submitted 24 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ACL 2022

  14. arXiv:2110.04020  [pdf, other

    cs.LG stat.ML

    Pathologies in priors and inference for Bayesian transformers

    Authors: Tristan Cinquin, Alexander Immer, Max Horn, Vincent Fortuin

    Abstract: In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial. Surprisingly, no successful attempts to improve transformer models in t… ▽ More

    Submitted 15 October, 2021; v1 submitted 8 October, 2021; originally announced October 2021.

  15. arXiv:2106.14806  [pdf, other

    cs.LG stat.ML

    Laplace Redux -- Effortless Bayesian Deep Learning

    Authors: Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig

    Abstract: Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection. The Laplace approximation (LA) is a classic, and arguably the simplest family of approximations for the intractable posteriors of deep neural networks. Yet, despite its simplicity, the L… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version; source code: https://github.com/AlexImmer/Laplace

  16. arXiv:2104.04975  [pdf, other

    stat.ML cs.LG

    Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

    Authors: Alexander Immer, Matthias Bauer, Vincent Fortuin, Gunnar Rätsch, Mohammad Emtiyaz Khan

    Abstract: Marginal-likelihood based model-selection, even though promising, is rarely used in deep learning due to estimation difficulties. Instead, most approaches rely on validation data, which may not be readily available. In this work, we present a scalable marginal-likelihood estimation method to select both hyperparameters and network architectures, based on the training data alone. Some hyperparamete… ▽ More

    Submitted 15 June, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

    Comments: ICML 2021

  17. arXiv:2008.08400  [pdf, other

    stat.ML cs.LG

    Improving predictions of Bayesian neural nets via local linearization

    Authors: Alexander Immer, Maciej Korzepa, Matthias Bauer

    Abstract: The generalized Gauss-Newton (GGN) approximation is often used to make practical Bayesian deep learning approaches scalable by replacing a second order derivative with a product of first order derivatives. In this paper we argue that the GGN approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN), which turns the BNN into a generalized linear mod… ▽ More

    Submitted 25 February, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

    Comments: AISTATS 2021

  18. arXiv:2007.11994  [pdf, other

    stat.ML cs.LG

    Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks

    Authors: Alexander Immer

    Abstract: In this thesis, we disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning. The generalized Gauss-Newton method is an optimization method that is used in several popular Bayesian deep learning algorithms. Algorithms that combine the Gauss-Newton method with the Laplace and Gaussian variational approximation have recently led to state-of-the-art results in Baye… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: Master's thesis at EPFL

  19. arXiv:2004.14070  [pdf, other

    stat.ML cs.LG

    Continual Deep Learning by Functional Regularisation of Memorable Past

    Authors: **bo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard E. Turner, Mohammad Emtiyaz Khan

    Abstract: Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regular… ▽ More

    Submitted 8 January, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

  20. arXiv:1906.01930  [pdf, other

    stat.ML cs.AI cs.LG

    Approximate Inference Turns Deep Networks into Gaussian Processes

    Authors: Mohammad Emtiyaz Khan, Alexander Immer, Ehsan Abedi, Maciej Korzepa

    Abstract: Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. This enables us to relate solutions and iterations of a deep-learning al… ▽ More

    Submitted 19 July, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: published at NeurIPS 2019: https://papers.nips.cc/paper/8573-approximate-inference-turns-deep-networks-into-gaussian-processes.pdf

  21. arXiv:1812.04428  [pdf, other

    cs.LG stat.ML

    Efficient learning of smooth probability functions from Bernoulli tests with guarantees

    Authors: Paul Rolland, Ali Kavis, Alex Immer, Adish Singla, Volkan Cevher

    Abstract: We study the fundamental problem of learning an unknown, smooth probability function via pointwise Bernoulli tests. We provide a scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the convergence rate of our posterior update rule to the true probability function in L2-norm. Moreover, we allow the Bernoulli tests to depend on contextual feature… ▽ More

    Submitted 23 August, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

  22. arXiv:1711.10327  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    Generative Interest Estimation for Document Recommendations

    Authors: Danijar Hafner, Alexander Immer, Willi Raschkowski, Fabian Windheuser

    Abstract: Learning distributed representations of documents has pushed the state-of-the-art in several natural language processing tasks and was successfully applied to the field of recommender systems recently. In this paper, we propose a novel content-based recommender system based on learned representations and a generative model of user interest. Our method works as follows: First, we learn representati… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.