Skip to main content

Showing 1–50 of 62 results for author: Lakshminarayanan, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2312.09300  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Evaluation Improves Selective Generation in Large Language Models

    Authors: Jie Ren, Yao Zhao, Tu Vu, Peter J. Liu, Balaji Lakshminarayanan

    Abstract: Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely employed, recent research has demonstrated the limitations of using sequence-level probability estimates given by LLMs as reliable indicators of generation quali… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  4. arXiv:2311.11644  [pdf, ps, other

    eess.SY cs.LG

    Unraveling the Control Engineer's Craft with Neural Networks

    Authors: Braghadeesh Lakshminarayanan, Federico Dettù, Cristian R. Rojas, Simone Formentin

    Abstract: Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven contr… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 6 pages

  5. arXiv:2307.00667  [pdf, other

    stat.ML cs.AI cs.LG

    Morse Neural Networks for Uncertainty Quantification

    Authors: Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan

    Abstract: We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points. Fitting the Morse neural network via a KL-divergence loss yields 1) a (unnormalized) generative density, 2) an OOD detector, 3) a calibration temperature, 4) a… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML workshop on Structured Probabilistic Inference & Generative Modeling 2023

  6. arXiv:2305.17207  [pdf, other

    cs.CV

    Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models

    Authors: Yunhao Ge, Jie Ren, Jia** Zhao, Kaifeng Chen, Andrew Gallagher, Laurent Itti, Balaji Lakshminarayanan

    Abstract: We focus on the challenge of out-of-distribution (OOD) detection in deep learning models, a crucial aspect in ensuring reliability. Despite considerable effort, the problem remains significantly challenging in deep learning models due to their propensity to output over-confident predictions for OOD inputs. We propose a novel one-class open-set OOD detector that leverages text-image pre-trained mod… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 16 pages (including appendix and references), 3 figures

  7. arXiv:2302.11188  [pdf, other

    cs.LG

    What Are Effective Labels for Augmented Data? Improving Calibration and Robustness with AutoLabel

    Authors: Yao Qin, Xuezhi Wang, Balaji Lakshminarayanan, Ed H. Chi, Alex Beutel

    Abstract: A wide breadth of research has devised data augmentation approaches that can improve both accuracy and generalization performance for neural networks. However, augmented data can end up being far from the clean training data and what is the appropriate label is less clear. Despite this, most existing work simply uses one-hot labels for augmented data. In this paper, we show re-using one-hot labels… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted to SaTML-2023

  8. arXiv:2302.06235  [pdf, other

    cs.LG cs.CV stat.ML

    A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

    Authors: James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan

    Abstract: Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual… ▽ More

    Submitted 15 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. 23 pages, 10 tables, 3 figures

  9. arXiv:2302.05807  [pdf, other

    cs.LG stat.ML

    Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

    Authors: Jeremiah Zhe Liu, Krishnamurthy Dj Dvijotham, Jihyeon Lee, Quan Yuan, Martin Strobel, Balaji Lakshminarayanan, Deepak Ramachandran

    Abstract: Standard empirical risk minimization (ERM) training can produce deep neural network (DNN) models that are accurate on average but under-perform in under-represented population subgroups, especially when there are imbalanced group distributions in the long-tailed training data. Therefore, approaches that improve the accuracy-group robustness trade-off frontier of a DNN model (i.e. improving worst-g… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted to ICLR 2023. Included additional contribution from Martin Strobel

  10. arXiv:2212.09928  [pdf, other

    cs.CL cs.LG

    Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

    Authors: Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

    Abstract: The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical… ▽ More

    Submitted 4 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: EMNLP Findings 2023 Camera Ready

  11. arXiv:2212.01758  [pdf, other

    cs.CV

    Improving Zero-shot Generalization and Robustness of Multi-modal Models

    Authors: Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jia** Zhao

    Abstract: Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks and their zero-shot generalization ability is particularly exciting. While the top-5 zero-shot accuracies of these models are very high, the top-1 accuracies are much lower (over 25% gap in some cases). We investigate the reasons for this performance gap and find that many… ▽ More

    Submitted 25 May, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: CVPR 2023

  12. arXiv:2209.15558  [pdf, other

    cs.CL

    Out-of-Distribution Detection and Selective Generation for Conditional Language Models

    Authors: Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu

    Abstract: Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the nex… ▽ More

    Submitted 7 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Published in ICLR 2023

  13. arXiv:2207.07411  [pdf, other

    cs.LG stat.ML

    Plex: Towards Reliability using Pretrained Large Model Extensions

    Authors: Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek , et al. (1 additional authors not shown)

    Abstract: A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive per… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Code available at https://goo.gle/plex-code

  14. arXiv:2205.00403  [pdf, other

    cs.LG stat.ML

    A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

    Authors: Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zack Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan

    Abstract: Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ens… ▽ More

    Submitted 30 December, 2022; v1 submitted 1 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2006.10108

  15. arXiv:2111.12951  [pdf, other

    cs.LG cs.AI

    Reliable Graph Neural Networks for Drug Discovery Under Distributional Shift

    Authors: Kehang Han, Balaji Lakshminarayanan, Jeremiah Liu

    Abstract: The concern of overconfident mis-predictions under distributional shift demands extensive reliability research on Graph Neural Networks used in critical tasks in drug discovery. Here we first introduce CardioTox, a real-world benchmark on drug cardio-toxicity to facilitate such efforts. Our exploratory study shows overconfident mis-predictions are often distant from training data. That leads us to… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: 5 page main body, 5 page appendix. Accepted by NeurIPS DistShift Workshop 2021

  16. arXiv:2110.07858  [pdf, other

    cs.LG cs.CV

    Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

    Authors: Yao Qin, Chiyuan Zhang, Ting Chen, Balaji Lakshminarayanan, Alex Beutel, Xuezhi Wang

    Abstract: We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates… ▽ More

    Submitted 22 February, 2023; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS-2022

  17. arXiv:2110.03360  [pdf, other

    cs.LG cs.CV stat.ML

    Sparse MoEs meet Efficient Ensembles

    Authors: James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

    Abstract: Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that the two approaches have complementary features whose combinatio… ▽ More

    Submitted 9 July, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: 59 pages, 26 figures, 36 tables. Accepted at TMLR

  18. arXiv:2110.02609  [pdf, other

    stat.ML cs.LG

    Deep Classifiers with Label Noise Modeling and Distance Awareness

    Authors: Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou

    Abstract: Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distribution calibration, both of these types of uncert… ▽ More

    Submitted 8 August, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Published in TMLR

  19. arXiv:2108.00106  [pdf, other

    cs.LG cs.AI

    Soft Calibration Objectives for Neural Networks

    Authors: Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

    Abstract: Optimal decision making requires that classifiers produce uncertainty estimates consistent with their empirical accuracy. However, deep neural networks are often under- or over-confident in their predictions. Consequently, methods have been developed to improve the calibration of their predictive uncertainty both during training and post-hoc. In this work, we propose differentiable losses to impro… ▽ More

    Submitted 7 December, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: 17 pages total, 10 page main paper, 5 page appendix, 10 figures total, 8 figures in main paper, 2 figures in appendix

  20. arXiv:2107.11413  [pdf, other

    cs.LG cs.HC

    An Instance-Dependent Simulation Framework for Learning with Label Noise

    Authors: Keren Gu, Xander Masotto, Vandana Bachani, Balaji Lakshminarayanan, Jack Nikodem, Dong Yin

    Abstract: We propose a simulation framework for generating instance-dependent noisy labels via a pseudo-labeling paradigm. We show that the distribution of the synthetic noisy labels generated with our framework is closer to human labels compared to independent and class-conditional random flip**. Equipped with controllable label noise, we study the negative impact of noisy labels across a few practical s… ▽ More

    Submitted 17 October, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: Datasets released at https://github.com/deepmind/deepmind-research/tree/master/noisy_label

  21. arXiv:2107.10492  [pdf, other

    cs.LG cs.IT stat.ML

    Bandit Quickest Changepoint Detection

    Authors: Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan

    Abstract: Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, whe… ▽ More

    Submitted 13 June, 2023; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Some typos fixed in the NeurIPS 2021 version

  22. arXiv:2107.08189  [pdf, other

    cs.LG cs.CY

    BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark

    Authors: Anand Avati, Martin Seneviratne, Emily Xue, Zhen Xu, Balaji Lakshminarayanan, Andrew M. Dai

    Abstract: Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, mos… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

  23. arXiv:2106.12772  [pdf, other

    cs.LG stat.ML

    Task-agnostic Continual Learning with Hybrid Probabilistic Models

    Authors: Polina Kirichenko, Mehrdad Farajtabar, Dushyant Rao, Balaji Lakshminarayanan, Nir Levine, Ang Li, Huiyi Hu, Andrew Gordon Wilson, Razvan Pascanu

    Abstract: Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to lea… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  24. arXiv:2106.09022  [pdf, other

    cs.LG

    A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection

    Authors: Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, Balaji Lakshminarayanan

    Abstract: Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice. On a wide selection of challenging vision, language, and biology OOD… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  25. arXiv:2106.08365  [pdf, other

    cs.LG cs.AI stat.ML

    Test Sample Accuracy Scales with Training Sample Density in Neural Networks

    Authors: Xu Ji, Razvan Pascanu, Devon Hjelm, Balaji Lakshminarayanan, Andrea Vedaldi

    Abstract: Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space. We find that a bound on empirical training error smoothed across linear activation regions scales inversely with training sample density in representation space. Empirically, we verify this bound i… ▽ More

    Submitted 28 July, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: CoLLAs 2022 oral

  26. arXiv:2106.04015  [pdf, other

    cs.LG

    Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

    Authors: Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W. Dusenberry, Sebastian Farquhar, Qixuan Feng, Angelos Filos, Marton Havasi, Rodolphe Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas Padhy, Jie Ren, Tim G. J. Rudner, Faris Sbahi, Yeming Wen, Florian Wenzel, Kevin Murphy, D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal , et al. (1 additional authors not shown)

    Abstract: High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compu… ▽ More

    Submitted 5 January, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  27. arXiv:2106.03004  [pdf, other

    cs.LG

    Exploring the Limits of Out-of-Distribution Detection

    Authors: Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

    Abstract: Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transform… ▽ More

    Submitted 28 July, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

    Comments: S.F. and J.R. contributed equally

  28. Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

    Authors: Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens

    Abstract: We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each train… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: Under Review, 19 Pages

    Journal ref: Medical Image Analysis (2022)

  29. arXiv:2010.09875  [pdf, other

    cs.LG stat.ML

    Combining Ensembles and Data Augmentation can Harm your Calibration

    Authors: Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

    Abstract: Ensemble methods which average over multiple neural network predictions are a simple approach to improve a model's calibration and robustness. Similarly, data augmentation techniques, which encode prior information in the form of invariant feature transformations, are effective for improving calibration and robustness. In this paper, we show a surprising pathology: combining ensembles and data aug… ▽ More

    Submitted 22 March, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

  30. arXiv:2010.06610  [pdf, other

    cs.LG cs.CV stat.ML

    Training independent subnetworks for robust prediction

    Authors: Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran

    Abstract: Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network. However, these methods still require multiple forward passes for prediction, leading to a significant computational cost. In this work, we show a surprising result: the benefits of using multiple pred… ▽ More

    Submitted 4 August, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: Updated to the ICLR camera ready version, added reference to Soflaei et al. 2020

  31. arXiv:2007.05864  [pdf, other

    stat.ML cs.LG

    Bayesian Deep Ensembles via the Neural Tangent Kernel

    Authors: Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

    Abstract: We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We intro… ▽ More

    Submitted 24 October, 2020; v1 submitted 11 July, 2020; originally announced July 2020.

  32. arXiv:2007.05134  [pdf, other

    cs.LG stat.ML

    Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

    Authors: Shreyas Padhy, Zachary Nado, Jie Ren, Jeremiah Liu, Jasper Snoek, Balaji Lakshminarayanan

    Abstract: Accurate estimation of predictive uncertainty in modern neural networks is critical to achieve well calibrated predictions and detect out-of-distribution (OOD) inputs. The most promising approaches have been predominantly focused on improving model uncertainty (e.g. deep ensembles and Bayesian neural networks) and post-processing techniques for OOD detection (e.g. ODIN and Mahalanobis distance). H… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

  33. arXiv:2006.10963  [pdf, other

    cs.LG stat.ML

    Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

    Authors: Zachary Nado, Shreyas Padhy, D. Sculley, Alexander D'Amour, Balaji Lakshminarayanan, Jasper Snoek

    Abstract: Covariate shift has been shown to sharply degrade both predictive accuracy and the calibration of uncertainty estimates for deep learning models. This is worrying, because covariate shift is prevalent in a wide range of real world deployment settings. However, in this paper, we note that frequently there exists the potential to access small unlabeled batches of the shifted data just before predict… ▽ More

    Submitted 14 January, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

  34. arXiv:2006.10108  [pdf, other

    cs.LG stat.ML

    Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

    Authors: Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan

    Abstract: Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural net… ▽ More

    Submitted 25 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  35. arXiv:2006.09273  [pdf, other

    cs.LG stat.ML

    Density of States Estimation for Out-of-Distribution Detection

    Authors: Warren R. Morningstar, Cusuh Ham, Andrew G. Gallagher, Balaji Lakshminarayanan, Alexander A. Alemi, Joshua V. Dillon

    Abstract: Perhaps surprisingly, recent studies have shown probabilistic model likelihoods have poor specificity for out-of-distribution (OOD) detection and often assign higher likelihoods to OOD data than in-distribution data. To ameliorate this issue we propose DoSE, the density of states estimator. Drawing on the statistical physics notion of ``density of states,'' the DoSE decision rule avoids direct com… ▽ More

    Submitted 22 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Submitted to NeurIPS. Corrected footnote from: "34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada" to "Preprint. Under review."

  36. arXiv:2005.07186  [pdf, other

    cs.LG stat.ML

    Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

    Authors: Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

    Abstract: Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from effic… ▽ More

    Submitted 14 August, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: Published in the International Conference on Machine Learning (ICML) 2020. Code available at https://github.com/google/edward2

  37. arXiv:1912.02781  [pdf, other

    stat.ML cs.CV cs.LG

    AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

    Authors: Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan

    Abstract: Modern deep neural networks can achieve high accuracy when the training distribution and test distribution are identically distributed, but this assumption is frequently violated in practice. When the train and test distributions are mismatched, accuracy can plummet. Currently there are few techniques that improve robustness to unforeseen data shifts encountered during deployment. In this work, we… ▽ More

    Submitted 17 February, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Code available at https://github.com/google-research/augmix

  38. arXiv:1912.02762  [pdf, other

    stat.ML cs.LG

    Normalizing Flows for Probabilistic Modeling and Inference

    Authors: George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji Lakshminarayanan

    Abstract: Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of… ▽ More

    Submitted 8 April, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Review article, 64 pages, 9 figures. Published in the Journal of Machine Learning Research (see https://jmlr.org/papers/v22/19-1028.html)

    Journal ref: Journal of Machine Learning Research, 22(57):1-64, 2021

  39. arXiv:1912.02757  [pdf, other

    stat.ML cs.LG

    Deep Ensembles: A Loss Landscape Perspective

    Authors: Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan

    Abstract: Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, non-bootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ense… ▽ More

    Submitted 24 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

  40. arXiv:1906.02994  [pdf, other

    stat.ML cs.LG

    Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality

    Authors: Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Balaji Lakshminarayanan

    Abstract: Recent work has shown that deep generative models can assign higher likelihood to out-of-distribution data sets than to their training data (Nalisnick et al., 2019; Choi et al., 2019). We posit that this phenomenon is caused by a mismatch between the model's typical set and its areas of high probability density. In-distribution inputs should reside in the former but not necessarily in the latter,… ▽ More

    Submitted 16 October, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  41. arXiv:1906.02845  [pdf, other

    stat.ML cs.LG

    Likelihood Ratios for Out-of-Distribution Detection

    Authors: Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan

    Abstract: Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based… ▽ More

    Submitted 5 December, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted to NeurIPS 2019

  42. arXiv:1906.02530  [pdf, other

    stat.ML cs.LG

    Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

    Authors: Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, Jasper Snoek

    Abstract: Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a var… ▽ More

    Submitted 17 December, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Advances in Neural Information Processing Systems, 2019

  43. arXiv:1902.02767  [pdf, other

    cs.LG stat.ML

    Hybrid Models with Deep and Invertible Features

    Authors: Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan

    Abstract: We propose a neural hybrid model consisting of a linear model defined on a set of features computed by a deep, invertible transformation (i.e. a normalizing flow). An attractive property of our model is that both p(features), the density of the features, and p(targets | features), the predictive distribution, can be computed exactly in a single feed-forward pass. We show that our hybrid model, des… ▽ More

    Submitted 29 May, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

    Comments: ICML 2019

  44. arXiv:1812.02224  [pdf, other

    stat.ML cs.LG

    Adapting Auxiliary Losses Using Gradient Similarity

    Authors: Yunshu Du, Wojciech M. Czarnecki, Siddhant M. Jayakumar, Mehrdad Farajtabar, Razvan Pascanu, Balaji Lakshminarayanan

    Abstract: One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss… ▽ More

    Submitted 25 November, 2020; v1 submitted 5 December, 2018; originally announced December 2018.

  45. arXiv:1810.09136  [pdf, other

    stat.ML cs.LG

    Do Deep Generative Models Know What They Don't Know?

    Authors: Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan

    Abstract: A neural network deployed in the wild may be asked to make predictions for inputs that were drawn from a different distribution than that of the training data. A plethora of work has demonstrated that it is easy to find or synthesize inputs for which a neural network is highly confident yet wrong. Generative models are widely viewed to be robust to such mistaken confidence as modeling the density… ▽ More

    Submitted 24 February, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: ICLR 2019

  46. arXiv:1807.09387  [pdf, other

    cs.LG stat.ML

    Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

    Authors: Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan

    Abstract: Predicting delayed outcomes is an important problem in recommender systems (e.g., if customers will finish reading an ebook). We formalize the problem as an adversarial, delayed online learning problem and consider how a proxy for the delayed outcome (e.g., if customers read a third of the book in 24 hours) can help minimize regret, even though the proxy is not available when making a prediction.… ▽ More

    Submitted 15 October, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

  47. arXiv:1802.06847  [pdf, other

    stat.ML cs.LG

    Distribution Matching in Variational Inference

    Authors: Mihaela Rosca, Balaji Lakshminarayanan, Shakir Mohamed

    Abstract: With the increasingly widespread deployment of generative models, there is a mounting need for a deeper understanding of their behaviors and limitations. In this paper, we expose the limitations of Variational Autoencoders (VAEs), which consistently fail to learn marginal distributions in both latent and visible spaces. We show this to be a consequence of learning by matching conditional distribut… ▽ More

    Submitted 10 June, 2019; v1 submitted 19 February, 2018; originally announced February 2018.

  48. arXiv:1710.08446  [pdf, other

    stat.ML cs.LG

    Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

    Authors: William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow

    Abstract: Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each playe… ▽ More

    Submitted 20 February, 2018; v1 submitted 23 October, 2017; originally announced October 2017.

    Comments: 18 pages

  49. arXiv:1706.04987  [pdf, other

    stat.ML cs.LG

    Variational Approaches for Auto-Encoding Generative Adversarial Networks

    Authors: Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed

    Abstract: Auto-encoding generative adversarial networks (GANs) combine the standard GAN algorithm, which discriminates between real and model-generated data, with a reconstruction loss given by an auto-encoder. Such models aim to prevent mode collapse in the learned generative model by ensuring that it is grounded in all the available training data. In this paper, we develop a principle upon which auto-enco… ▽ More

    Submitted 21 October, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

  50. arXiv:1705.10743  [pdf, other

    cs.LG stat.ML

    The Cramer Distance as a Solution to Biased Wasserstein Gradients

    Authors: Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos

    Abstract: The Wasserstein probability metric has received much attention from the machine learning community. Unlike the Kullback-Leibler divergence, which strictly measures change in probability, the Wasserstein metric reflects the underlying geometry between outcomes. The value of being sensitive to this geometry has been demonstrated, among others, in ordinal regression and generative modelling. In this… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.