Skip to main content

Showing 1–9 of 9 results for author: Key, O

.
  1. arXiv:2404.12968  [pdf, other

    cs.LG cs.DC stat.AP

    Scalable Data Assimilation with Message Passing

    Authors: Oscar Key, So Takao, Daniel Giles, Marc Peter Deisenroth

    Abstract: Data assimilation is a core component of numerical weather prediction systems. The large quantity of data processed during assimilation requires the computation to be distributed across increasingly many compute nodes, yet existing approaches suffer from synchronisation overhead in this setting. In this paper, we exploit the formulation of data assimilation as a Bayesian inference problem and appl… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  2. arXiv:2307.06440  [pdf, other

    cs.LG cs.AI cs.CL cs.NE cs.PF

    No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

    Authors: Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner

    Abstract: The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer drop**), batch sel… ▽ More

    Submitted 14 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  3. arXiv:2301.11674  [pdf, other

    stat.ME stat.CO stat.ML

    Optimally-Weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference

    Authors: Ayush Bharti, Masha Naslidnyk, Oscar Key, Samuel Kaski, François-Xavier Briol

    Abstract: Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is th… ▽ More

    Submitted 10 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  4. arXiv:2209.07396  [pdf, other

    stat.ML cs.LG

    Towards Healing the Blindness of Score Matching

    Authors: Mingtian Zhang, Oscar Key, Peter Hayes, David Barber, Brooks Paige, François-Xavier Briol

    Abstract: Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of de… ▽ More

    Submitted 15 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  5. arXiv:2111.10275  [pdf, other

    stat.ML cs.LG stat.ME

    Composite Goodness-of-fit Tests with Kernels

    Authors: Oscar Key, Arthur Gretton, François-Xavier Briol, Tamara Fernandez

    Abstract: Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more involved methods are required will depend on whether the model is really misspecified, and there is a lack of generally applicable methods to answer this question. In… ▽ More

    Submitted 27 February, 2024; v1 submitted 19 November, 2021; originally announced November 2021.

  6. arXiv:2103.08951  [pdf, other

    cs.LG stat.AP

    Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties

    Authors: Lisa Schut, Oscar Key, Rory McGrath, Luca Costabello, Bogdan Sacaleanu, Medb Corcoran, Yarin Gal

    Abstract: Counterfactual explanations (CEs) are a practical tool for demonstrating why machine learning classifiers make particular decisions. For CEs to be useful, it is important that they are easy for users to interpret. Existing methods for generating interpretable CEs rely on auxiliary generative models, which may not be suitable for complex datasets, and incur engineering overhead. We introduce a simp… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: 21 pages, 13 Figures

    Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  7. arXiv:2102.11409  [pdf, other

    cs.LG stat.ML

    On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty

    Authors: Joost van Amersfoort, Lewis Smith, Andrew Jesson, Oscar Key, Yarin Gal

    Abstract: Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing poin… ▽ More

    Submitted 7 March, 2022; v1 submitted 22 February, 2021; originally announced February 2021.

  8. arXiv:2011.00515  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    On Signal-to-Noise Ratio Issues in Variational Inference for Deep Gaussian Processes

    Authors: Tim G. J. Rudner, Oscar Key, Yarin Gal, Tom Rainforth

    Abstract: We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues. Specifically, we show both theoretically and via an extensive empirical evaluation that the SNR of the gradient estimates for the latent variable's variational parameters decreases as the number of importance sampl… ▽ More

    Submitted 21 July, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: Published in Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

  9. arXiv:2010.04116  [pdf, other

    cs.LG cs.AI

    Interlocking Backpropagation: Improving depthwise model-parallelism

    Authors: Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal

    Abstract: The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wa… ▽ More

    Submitted 7 July, 2022; v1 submitted 8 October, 2020; originally announced October 2020.