Skip to main content

Showing 1–50 of 53 results for author: van der Wilk, M

.
  1. arXiv:2406.02352  [pdf, other

    cs.LG

    System-Aware Neural ODE Processes for Few-Shot Bayesian Optimization

    Authors: Jixiang Qing, Becky D Langdon, Robert M Lee, Behrang Shafei, Mark van der Wilk, Calvin Tsay, Ruth Misener

    Abstract: We consider the problem of optimizing initial conditions and timing in dynamical systems governed by unknown ordinary differential equations (ODEs), where evaluating different initial conditions is costly and there are constraints on observation times. To identify the optimal conditions within several trials, we introduce a few-shot Bayesian Optimization (BO) framework based on the system's prior… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2402.17704  [pdf, other

    q-bio.QM cs.LG stat.ML

    Transfer Learning Bayesian Optimization to Design Competitor DNA Molecules for Use in Diagnostic Assays

    Authors: Ruby Sedgwick, John P. Goertz, Molly M. Stevens, Ruth Misener, Mark van der Wilk

    Abstract: With the rise in engineered biomolecular devices, there is an increased need for tailor-made biological sequences. Often, many similar biological sequences need to be made for a specific application meaning numerous, sometimes prohibitively expensive, lab experiments are necessary for their optimization. This paper presents a transfer learning design of experiments workflow to make this developmen… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  3. arXiv:2402.09849  [pdf, other

    cs.LG stat.ML

    Recommendations for Baselines and Benchmarking Approximate Gaussian Processes

    Authors: Sebastian W. Ober, Artem Artemev, Marcel Wagenländer, Rudolfs Grobins, Mark van der Wilk

    Abstract: Gaussian processes (GPs) are a mature and widely-used component of the ML toolbox. One of their desirable qualities is automatic hyperparameter selection, which allows for training without user intervention. However, in many realistic settings, approximations are typically needed, which typically do require tuning. We argue that this requirement for tuning complicates evaluation, which has led to… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Preprint. 25 pages, 16 figures

  4. arXiv:2402.08406  [pdf, other

    cs.LG

    Transition Constrained Bayesian Optimization via Markov Decision Processes

    Authors: Jose Pablo Folch, Calvin Tsay, Robert M Lee, Behrang Shafei, Weronika Ormaniec, Andreas Krause, Mark van der Wilk, Ruth Misener, Mojmír Mutný

    Abstract: Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, r… ▽ More

    Submitted 29 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 10 pages main, 32 pages total, 16 figures, 2 tables, preprint

  5. arXiv:2312.14856  [pdf, other

    cs.SE cs.AI

    Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code

    Authors: Shahin Honarvar, Mark van der Wilk, Alastair Donaldson

    Abstract: We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $\textit{question templates}$, each of which is a programming problem, parameterised so that it can be asked in many different forms. Each question template has… ▽ More

    Submitted 14 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Modified a typo in the conclusion section regarding the impact of temperature reduction on the diversity of errors

  6. arXiv:2312.00622  [pdf, other

    cs.LG math.OC stat.ME

    Practical Path-based Bayesian Optimization

    Authors: Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

    Abstract: There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itse… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 6 main pages, 12 with references and appendix. 4 figures, 2 tables. To appear in NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

    Journal ref: NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World

  7. arXiv:2311.14649  [pdf, other

    cs.LG stat.ML

    Learning in Deep Factor Graphs with Gaussian Belief Propagation

    Authors: Seth Nabarro, Mark van der Wilk, Andrew J Davison

    Abstract: We propose an approach to do learning in Gaussian factor graphs. We treat all relevant quantities (inputs, outputs, parameters, latents) as random variables in a graphical model, and view both training and prediction as inference problems with different observed nodes. Our experiments show that these problems can be efficiently solved with belief propagation (BP), whose updates are inherently loca… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  8. arXiv:2310.06131  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Layer-wise Equivariances Automatically using Gradients

    Authors: Tycho F. A. van der Ouderaa, Alexander Immer, Mark van der Wilk

    Abstract: Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and assoc… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  9. arXiv:2309.17161  [pdf, other

    q-bio.BM cs.LG stat.ML

    Current Methods for Drug Property Prediction in the Real World

    Authors: Jacob Green, Cecilia Cabrera Diaz, Maximilian A. H. Jakobs, Andrea Dimitracopoulos, Mark van der Wilk, Ryan D. Greenhalgh

    Abstract: Predicting drug properties is key in drug discovery to enable de-risking of assets before expensive clinical trials, and to find highly active compounds faster. Interest from the Machine Learning community has led to the release of a variety of benchmark datasets and proposed methods. However, it remains unclear for practitioners which method or approach is most suitable, as different papers bench… ▽ More

    Submitted 25 July, 2023; originally announced September 2023.

  10. arXiv:2306.03968  [pdf, other

    stat.ML cs.LG

    Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

    Authors: Alexander Immer, Tycho F. A. van der Ouderaa, Mark van der Wilk, Gunnar Rätsch, Bernhard Schölkopf

    Abstract: Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass throug… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  11. arXiv:2306.02931  [pdf, other

    stat.ML cs.LG

    Bivariate Causal Discovery using Bayesian Model Selection

    Authors: Anish Dhir, Samuel Power, Mark van der Wilk

    Abstract: Much of the causal discovery literature prioritises guaranteeing the identifiability of causal direction in statistical models. For structures within a Markov equivalence class, this requires strong assumptions which may not hold in real-world datasets, ultimately limiting the usability of these methods. Building on previous attempts, we show how to incorporate causal assumptions within the Bayesi… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  12. arXiv:2304.05091  [pdf, other

    stat.ML cs.LG

    Actually Sparse Variational Gaussian Processes

    Authors: Harry Jake Cunningham, Daniel Augusto de Souza, So Takao, Mark van der Wilk, Marc Peter Deisenroth

    Abstract: Gaussian processes (GPs) are typically criticised for their unfavourable scaling in both computational and memory requirements. For large datasets, sparse GPs reduce these demands by conditioning on a small set of inducing variables designed to summarise the data. In practice however, for large datasets requiring many inducing variables, such as low-lengthscale spatial data, even sparse GPs can be… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 14 pages, 5 figures, published in AISTATS 2023

  13. arXiv:2211.06149  [pdf, other

    cs.LG cs.CE stat.ML

    Combining Multi-Fidelity Modelling and Asynchronous Batch Bayesian Optimization

    Authors: Jose Pablo Folch, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

    Abstract: Bayesian Optimization is a useful tool for experiment design. Unfortunately, the classical, sequential setting of Bayesian Optimization does not translate well into laboratory experiments, for instance battery design, where measurements may come from different sources and their evaluations may require significant waiting times. Multi-fidelity Bayesian Optimization addresses the setting with measur… ▽ More

    Submitted 23 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 19 pages in main paper / 28 with references and appendix, 7 figures, 2 tables, accepted into Computers and Chemical Engineering

  14. arXiv:2210.07893  [pdf, other

    stat.ML cs.LG

    Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees

    Authors: Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge

    Abstract: Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stabilit… ▽ More

    Submitted 16 January, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

    Journal ref: Journal of Machine Learning Research, 2024

  15. arXiv:2206.14148  [pdf, other

    cs.LG cs.PL stat.ML

    Memory Safe Computations with XLA Compiler

    Authors: Artem Artemev, Tilman Roeder, Mark van der Wilk

    Abstract: Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overf… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Preprint

  16. arXiv:2204.07178  [pdf, other

    cs.LG

    Relaxing Equivariance Constraints with Non-stationary Continuous Filters

    Authors: Tycho F. A. van der Ouderaa, David W. Romero, Mark van der Wilk

    Abstract: Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although… ▽ More

    Submitted 13 November, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  17. arXiv:2202.12439  [pdf, other

    stat.ML cs.LG

    Learning Invariant Weights in Neural Networks

    Authors: Tycho F. A. van der Ouderaa, Mark van der Wilk

    Abstract: Assumptions about invariances or symmetries in data can significantly increase the predictive power of statistical models. Many commonly used models in machine learning are constraint to respect certain symmetries in the data, such as translation equivariance in convolutional neural networks, and incorporation of new symmetry types is actively being studied. Yet, efforts to learn such invariances… ▽ More

    Submitted 2 August, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  18. arXiv:2202.10638  [pdf, other

    stat.ML cs.LG

    Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

    Authors: Alexander Immer, Tycho F. A. van der Ouderaa, Gunnar Rätsch, Vincent Fortuin, Mark van der Wilk

    Abstract: Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the data augmentation parameters are chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentatio… ▽ More

    Submitted 13 October, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  19. arXiv:2202.00060  [pdf, other

    cs.LG math.OC

    SnAKe: Bayesian Optimization with Pathwise Exploration

    Authors: Jose Pablo Folch, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener

    Abstract: Bayesian Optimization is a very effective tool for optimizing expensive black-box functions. Inspired by applications develo** and characterizing reaction chemistry using droplet microfluidic reactors, we consider a novel setting where the expense of evaluating the function can increase significantly when making large input changes between iterations. We further assume we are working asynchronou… ▽ More

    Submitted 11 January, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: 10 main pages, 39 with appendix, 30 figures, 10 tables. Final camera-ready version for NeurIPS, with supplementary material included

  20. arXiv:2109.09417  [pdf, other

    stat.ML cs.LG

    Barely Biased Learning for Gaussian Process Regression

    Authors: David R. Burt, Artem Artemev, Mark van der Wilk

    Abstract: Recent work in scalable approximate Gaussian process regression has discussed a bias-variance-computation trade-off when estimating the log marginal likelihood. We suggest a method that adaptively selects the amount of computation to use when estimating the log marginal likelihood so that the bias of the objective function is guaranteed to be small. While simple in principle, our current implement… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  21. arXiv:2107.09301  [pdf, other

    stat.ML cs.LG

    A Bayesian Approach to Invariant Deep Neural Networks

    Authors: Nikolaos Mourdoukoutas, Marco Federici, Georges Pantalos, Mark van der Wilk, Vincent Fortuin

    Abstract: We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.

    Submitted 2 November, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 8 pages, 3 figures, To be published in ICML UDL 2021

  22. arXiv:2106.07512  [pdf, other

    stat.ML cs.LG

    Last Layer Marginal Likelihood for Invariance Learning

    Authors: Pola Schwöbel, Martin Jørgensen, Sebastian W. Ober, Mark van der Wilk

    Abstract: Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. Computing the marginal likelihood is hard for neural networks, but success with tr… ▽ More

    Submitted 1 March, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: AISTATS '22

  23. arXiv:2106.05586  [pdf, other

    stat.ML cs.LG

    Data augmentation in Bayesian neural networks and the cold posterior effect

    Authors: Seth Nabarro, Stoil Ganev, Adrià Garriga-Alonso, Vincent Fortuin, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks that incorporate data augmentation implicitly use a ``randomly perturbed log-likelihood [which] does not have a clean interpretation as a valid likelihood function'' (Izmailov et al. 2021). Here, we provide several approaches to develo** principled Bayesian neural networks incorporating data augmentation. We introduce a ``finite orbit'' setting which allows likelihoods t… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  24. BNNpriors: A library for Bayesian neural network inference with different prior distributions

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs w… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at Software Impacts

  25. arXiv:2105.04504  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Point Estimates for Deep Gaussian Processes

    Authors: Vincent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

    Abstract: Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpr… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  26. arXiv:2104.05674  [pdf, ps, other

    stat.ML cs.LG

    GPflux: A Library for Deep Gaussian Processes

    Authors: Vincent Dutordoir, Hugh Salimbeni, Eric Hambro, John McLeod, Felix Leibfried, Artem Artemev, Mark van der Wilk, James Hensman, Marc P. Deisenroth, ST John

    Abstract: We introduce GPflux, a Python library for Bayesian deep learning with a strong emphasis on deep Gaussian processes (DGPs). Implementing DGPs is a challenging endeavour due to the various mathematical subtleties that arise when dealing with multivariate Gaussian distributions and the complex bookkee** of indices. To date, there are no actively maintained, open-sourced and extendable libraries ava… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

  27. arXiv:2102.12108  [pdf, other

    stat.ML cs.LG

    The Promises and Pitfalls of Deep Kernel Learning

    Authors: Sebastian W. Ober, Carl E. Rasmussen, Mark van der Wilk

    Abstract: Deep kernel learning (DKL) and related techniques aim to combine the representational power of neural networks with the reliable uncertainty estimates of Gaussian processes. One crucial aspect of these models is an expectation that, because they are treated as Gaussian process models optimized using the marginal likelihood, they are protected from overfitting. However, we identify situations where… ▽ More

    Submitted 7 July, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021), 20 pages

  28. arXiv:2102.08314  [pdf, other

    stat.ML cs.LG

    Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

    Authors: Artem Artemev, David R. Burt, Mark van der Wilk

    Abstract: We propose a lower bound on the log marginal likelihood of Gaussian process regression models that can be computed without matrix factorisation of the full kernel matrix. We show that approximate maximum likelihood learning of model parameters by maximising our lower bound retains many of the sparse variational approach benefits while reducing the bias introduced into parameter learning. The basis… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Preprint

  29. arXiv:2102.06571  [pdf, other

    stat.ML cs.LG

    Bayesian Neural Network Priors Revisited

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Rätsch, Richard E. Turner, Mark van der Wilk, Laurence Aitchison

    Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, it is unclear whether these priors accurately reflect our true beliefs about the weight distributions or give optimal performance. To find better priors, we study summary statistics of neural network weights in networks trained using stochastic gradient descent (SGD). We find that convolution… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2022

  30. arXiv:2101.04097  [pdf, other

    stat.ML cs.LG

    Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks

    Authors: Adrià Garriga-Alonso, Mark van der Wilk

    Abstract: Infinite width limits of deep neural networks often have tractable forms. They have been used to analyse the behaviour of finite networks, as well as being useful methods in their own right. When investigating infinitely wide convolutional neural networks (CNNs), it was observed that the correlations arising from spatial weight sharing disappear in the infinite limit. This is undesirable, as spati… ▽ More

    Submitted 13 June, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)

  31. arXiv:2011.10575  [pdf, other

    q-bio.QM cs.LG stat.ML

    Design of Experiments for Verifying Biomolecular Networks

    Authors: Ruby Sedgwick, John Goertz, Molly Stevens, Ruth Misener, Mark van der Wilk

    Abstract: There is a growing trend in molecular and synthetic biology of using mechanistic (non machine learning) models to design biomolecular networks. Once designed, these networks need to be validated by experimental results to ensure the theoretical network correctly models the true system. However, these experiments can be expensive and time consuming. We propose a design of experiments approach for v… ▽ More

    Submitted 25 November, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Comment: Updated to correct typo "that that" => "that"

  32. arXiv:2011.09421  [pdf, other

    stat.ML cs.LG

    Understanding Variational Inference in Function-Space

    Authors: David R. Burt, Sebastian W. Ober, Adrià Garriga-Alonso, Mark van der Wilk

    Abstract: Recent work has attempted to directly approximate the `function-space' or predictive posterior distribution of Bayesian models, without approximating the posterior distribution over the parameters. This is appealing in e.g. Bayesian neural networks, where we only need the former, and the latter is hard to represent. In this work, we highlight some advantages and limitations of employing the Kullba… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

    Comments: 19 pages

  33. arXiv:2010.14499  [pdf, other

    cs.LG

    A Bayesian Perspective on Training Speed and Model Selection

    Authors: Clare Lyle, Lisa Schut, Binxin Ru, Yarin Gal, Mark van der Wilk

    Abstract: We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minim… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: To be presented at NeurIPS 2020

  34. arXiv:2008.00323  [pdf, other

    stat.ML cs.LG

    Convergence of Sparse Variational Inference in Gaussian Processes Regression

    Authors: David R. Burt, Carl Edward Rasmussen, Mark van der Wilk

    Abstract: Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M \ll N$ inducing variables to form an approximation at a… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

    Comments: Extended version of http://proceedings.mlr.press/v97/burt19a.html (arxiv version: arXiv:1903.03571 ). Published in Journal of Machine Learning Research: http://jmlr.org/papers/v21/19-1015.html. Code available at: https://github.com/markvdw/RobustGP

    Journal ref: Journal of Machine Learning Research, 21(131), 1-63 (2020)

  35. arXiv:2006.13170  [pdf, other

    stat.ML cs.LG

    Variational Orthogonal Features

    Authors: David R. Burt, Carl Edward Rasmussen, Mark van der Wilk

    Abstract: Sparse stochastic variational inference allows Gaussian process models to be applied to large datasets. The per iteration computational cost of inference with this method is $\mathcal{O}(\tilde{N}M^2+M^3),$ where $\tilde{N}$ is the number of points in a minibatch and $M$ is the number of `inducing features', which determine the expressiveness of the variational family. Several recent works have sh… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  36. arXiv:2006.06015  [pdf, other

    cs.CV cs.LG

    Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty

    Authors: Miguel Monteiro, Loïc Le Folgoc, Daniel Coelho de Castro, Nick Pawlowski, Bernardo Marques, Konstantinos Kamnitsas, Mark van der Wilk, Ben Glocker

    Abstract: In image segmentation, there is often more than one plausible solution for a given input. In medical imaging, for example, experts will often disagree about the exact location of object boundaries. Estimating this inherent uncertainty and predicting multiple plausible hypotheses is of great interest in many applications, yet this ability is lacking in most current deep learning methods. In this pa… ▽ More

    Submitted 22 December, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Published at Neurips2020. 17 pages, 11 figures, 2 tables

  37. arXiv:2006.04492  [pdf, other

    stat.ML cs.LG

    Speedy Performance Estimation for Neural Architecture Search

    Authors: Binxin Ru, Clare Lyle, Lisa Schut, Miroslav Fil, Mark van der Wilk, Yarin Gal

    Abstract: Reliable yet efficient evaluation of generalisation performance of a proposed architecture is crucial to the success of neural architecture search (NAS). Traditional approaches face a variety of limitations: training each architecture to completion is prohibitively expensive, early stopped validation accuracy may correlate poorly with fully trained performance, and model-based estimators require l… ▽ More

    Submitted 7 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 23 pages, 14 figures

  38. arXiv:2005.00178  [pdf, other

    cs.LG stat.ML

    On the Benefits of Invariance in Neural Networks

    Authors: Clare Lyle, Mark van der Wilk, Marta Kwiatkowska, Yarin Gal, Benjamin Bloem-Reddy

    Abstract: Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of methods to incorporate invariance into models, theoretical understanding is poor and there is no way to assess when one method should be preferred over another. In… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

  39. arXiv:2004.03553  [pdf, other

    cs.LG stat.ML

    Capsule Networks -- A Probabilistic Perspective

    Authors: Lewis Smith, Lisa Schut, Yarin Gal, Mark van der Wilk

    Abstract: 'Capsule' models try to explicitly represent the poses of objects, enforcing a linear relationship between an object's pose and that of its constituent parts. This modelling assumption should lead to robustness to viewpoint changes since the sub-object/super-object relationships are invariant to the poses of the object. We describe a probabilistic generative model which encodes such capsule assump… ▽ More

    Submitted 6 January, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

  40. arXiv:2003.01115  [pdf, other

    stat.ML cs.LG

    A Framework for Interdomain and Multioutput Gaussian Processes

    Authors: Mark van der Wilk, Vincent Dutordoir, ST John, Artem Artemev, Vincent Adam, James Hensman

    Abstract: One obstacle to the use of Gaussian processes (GPs) in large-scale problems, and as a component in deep learning system, is the need for bespoke derivations and implementations for small variations in the model or inference. In order to improve the utility of GPs we need a modular system that allows rapid implementation and testing, as seen in the neural network community. We present a mathematica… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  41. arXiv:1906.09360  [pdf, other

    stat.ML cs.LG

    Scalable Bayesian dynamic covariance modeling with variational Wishart and inverse Wishart processes

    Authors: Creighton Heaukulani, Mark van der Wilk

    Abstract: We implement gradient-based variational inference routines for Wishart and inverse Wishart processes, which we apply as Bayesian models for the dynamic, heteroskedastic covariance matrix of a multivariate time series. The Wishart and inverse Wishart processes are constructed from i.i.d. Gaussian processes, existing variational inference algorithms for which form the basis of our approach. These me… ▽ More

    Submitted 4 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  42. arXiv:1906.05828  [pdf, other

    stat.ML cs.LG

    Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models

    Authors: Alessandro Davide Ialongo, Mark van der Wilk, James Hensman, Carl Edward Rasmussen

    Abstract: We identify a new variational inference scheme for dynamical systems whose transition function is modelled by a Gaussian process. Inference in this setting has either employed computationally intensive MCMC methods, or relied on factorisations of the variational posterior. As we demonstrate in our experiments, the factorisation between latent system states and transition function can lead to a mis… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: 10 pages, 4 figures, 3 tables. Published in the proceedings of the Thirty-sixth International Conference on Machine Learning (ICML), 2019

    Journal ref: PMLR 97:2931-2940 (2019)

  43. arXiv:1903.03571  [pdf, other

    stat.ML cs.LG

    Rates of Convergence for Sparse Variational Gaussian Process Regression

    Authors: David R. Burt, Carl E. Rasmussen, Mark van der Wilk

    Abstract: Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the… ▽ More

    Submitted 3 September, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: International Conference on Machine Learning (ICML 2019)

  44. arXiv:1902.05888  [pdf, other

    stat.ML cs.LG

    Bayesian Image Classification with Deep Convolutional Gaussian Processes

    Authors: Vincent Dutordoir, Mark van der Wilk, Artem Artemev, James Hensman

    Abstract: In decision-making systems, it is important to have classifiers that have calibrated uncertainties, with an optimisation objective that can be used for automated model selection and training. Gaussian processes (GPs) provide uncertainty estimates and a marginal likelihood objective, but their weak inductive biases lead to inferior accuracy. This has limited their applicability in certain tasks (e.… ▽ More

    Submitted 4 March, 2020; v1 submitted 15 February, 2019; originally announced February 2019.

    Comments: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, PMLR: Volume 108

  45. arXiv:1812.06067  [pdf, other

    stat.ML cs.LG

    Non-Factorised Variational Inference in Dynamical Systems

    Authors: Alessandro Davide Ialongo, Mark van der Wilk, James Hensman, Carl Edward Rasmussen

    Abstract: We focus on variational inference in dynamical systems where the discrete time transition function (or evolution rule) is modelled by a Gaussian process. The dominant approach so far has been to use a factorised posterior distribution, decoupling the transition function from the system states. This is not exact in general and can lead to an overconfident posterior over the transition function as w… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 6 pages, 1 figure, 1 table

  46. arXiv:1812.03973  [pdf, other

    cs.LG cs.PL stat.ML

    Bayesian Layers: A Module for Neural Network Uncertainty

    Authors: Dustin Tran, Michael W. Dusenberry, Mark van der Wilk, Danijar Hafner

    Abstract: We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with drop-in replacements for common layers. This enables composition via a unified abstraction over deterministic and stochastic functions and allows for scalability via the underlying system. These layers capture uncertainty over weights (Bayesian neural ne… ▽ More

    Submitted 5 March, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

    Comments: Code available at https://github.com/tensorflow/tensor2tensor

  47. arXiv:1812.03580  [pdf, other

    stat.ML cs.LG

    Closed-form Inference and Prediction in Gaussian Process State-Space Models

    Authors: Alessandro Davide Ialongo, Mark van der Wilk, Carl Edward Rasmussen

    Abstract: We examine an analytic variational inference scheme for the Gaussian Process State Space Model (GPSSM) - a probabilistic model for system identification and time-series modelling. Our approach performs variational inference over both the system states and the transition function. We exploit Markov structure in the true posterior, as well as an inducing point approximation to achieve linear time co… ▽ More

    Submitted 9 December, 2018; originally announced December 2018.

    Comments: 7 pages, 6 figures

  48. arXiv:1808.05563  [pdf, other

    cs.LG stat.ML

    Learning Invariances using the Marginal Likelihood

    Authors: Mark van der Wilk, Matthias Bauer, ST John, James Hensman

    Abstract: Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations on the input that are known to be irrelevant (e.g. translation). Commonly, this is done through data augmentation, where the training set is enlarged by applying hand-craft… ▽ More

    Submitted 16 August, 2018; originally announced August 2018.

  49. arXiv:1709.01894  [pdf, other

    stat.ML cs.LG

    Convolutional Gaussian Processes

    Authors: Mark van der Wilk, Carl Edward Rasmussen, James Hensman

    Abstract: We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to high-dimensional inputs like images. The main contribution of our work is the construction of an inter-domain inducing point approximation that is well-tailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with f… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

    Comments: To appear in Advances in Neural Information Processing Systems 30 (NIPS 2017)

  50. arXiv:1610.08733  [pdf, other

    stat.ML

    GPflow: A Gaussian process library using TensorFlow

    Authors: Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, James Hensman

    Abstract: GPflow is a Gaussian process library that uses TensorFlow for its core computations and Python for its front end. The distinguishing features of GPflow are that it uses variational inference as the primary approximation method, provides concise code through the use of automatic differentiation, has been engineered with a particular emphasis on software testing and is able to exploit GPU hardware.

    Submitted 27 October, 2016; originally announced October 2016.