Search | arXiv e-print repository

Multicalibration for Confidence Scoring in LLMs

Authors: Gianluca Detommaso, Martin Bertran, Riccardo Fogliato, Aaron Roth

Abstract: This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting grou**s of the data. We show how to form grou**s for prompt/completion pairs that are correlated with the probability of correctnes… ▽ More This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting grou**s of the data. We show how to form grou**s for prompt/completion pairs that are correlated with the probability of correctness via two techniques: clustering within an embedding space, and "self-annotation" - querying the LLM by asking it various yes-or-no questions about the prompt. We also develop novel variants of multicalibration algorithms that offer performance improvements by reducing their tendency to overfit. Through systematic benchmarking across various question answering datasets and LLMs, we show how our techniques can yield confidence scores that provide substantial improvements in fine-grained measures of both calibration and accuracy compared to existing methods. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2302.04019 [pdf, other]

Fortuna: A Library for Uncertainty Quantification in Deep Learning

Authors: Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau

Abstract: We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved unce… ▽ More We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2207.08200 [pdf, other]

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

Authors: Gianluca Detommaso, Alberto Gasparin, Andrew Wilson, Cedric Archambeau

Abstract: As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on t… ▽ More As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on the inputs through a measure of their distance from the training set. DAP calibration is agnostic to the posterior inference method, and it can be performed as a post-processing step. We demonstrate its effectiveness against several baselines in a variety of classification and regression problems, including benchmarks designed to test the quality of predictive distributions away from the data. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2106.09762 [pdf, other]

Causal Bias Quantification for Continuous Treatments

Authors: Gianluca Detommaso, Michael Brückner, Philip Schulz, Victor Chernozhukov

Abstract: We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias ca… ▽ More We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias can be estimated efficiently and allows for causal regularization of predictive probabilistic models. We demonstrate the effectiveness of our method for causal bias quantification in various settings where (not) controlling for certain covariates would introduce causal bias. △ Less

Submitted 30 January, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:1910.12431 [pdf, other]

Multilevel Dimension-Independent Likelihood-Informed MCMC for Large-Scale Inverse Problems

Authors: Tiangang Cui, Gianluca Detommaso, Robert Scheichl

Abstract: We present a non-trivial integration of dimension-independent likelihood-informed (DILI) MCMC (Cui, Law, Marzouk, 2016) and the multilevel MCMC (Dodwell et al., 2015) to explore the hierarchy of posterior distributions. This integration offers several advantages: First, DILI-MCMC employs an intrinsic likelihood-informed subspace (LIS) (Cui et al., 2014) -- which involves a number of forward and ad… ▽ More We present a non-trivial integration of dimension-independent likelihood-informed (DILI) MCMC (Cui, Law, Marzouk, 2016) and the multilevel MCMC (Dodwell et al., 2015) to explore the hierarchy of posterior distributions. This integration offers several advantages: First, DILI-MCMC employs an intrinsic likelihood-informed subspace (LIS) (Cui et al., 2014) -- which involves a number of forward and adjoint model simulations -- to design accelerated operator-weighted proposals. By exploiting the multilevel structure of the discretised parameters and discretised forward models, we design a Rayleigh-Ritz procedure to significantly reduce the computational effort in building the LIS and operating with DILI proposals. Second, the resulting DILI-MCMC can drastically improve the sampling efficiency of MCMC at each level, and hence reduce the integration error of the multilevel algorithm for fixed CPU time. Numerical results confirm the improved computational efficiency of the multilevel DILI approach. △ Less

Submitted 29 November, 2023; v1 submitted 28 October, 2019; originally announced October 2019.

arXiv:1905.10687 [pdf, other]

HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference

Authors: Jakob Kruse, Gianluca Detommaso, Ullrich Köthe, Robert Scheichl

Abstract: Many recent invertible neural architectures are based on coupling block designs where variables are divided in two subsets which serve as inputs of an easily invertible (usually affine) triangular transformation. While such a transformation is invertible, its Jacobian is very sparse and thus may lack expressiveness. This work presents a simple remedy by noting that subdivision and (affine) couplin… ▽ More Many recent invertible neural architectures are based on coupling block designs where variables are divided in two subsets which serve as inputs of an easily invertible (usually affine) triangular transformation. While such a transformation is invertible, its Jacobian is very sparse and thus may lack expressiveness. This work presents a simple remedy by noting that subdivision and (affine) coupling can be repeated recursively within the resulting subsets, leading to an efficiently invertible block with dense, triangular Jacobian. By formulating our recursive coupling scheme via a hierarchical architecture, HINT allows sampling from a joint distribution p(y,x) and the corresponding posterior p(x|y) using a single invertible network. We evaluate our method on some standard data sets and benchmark its full power for density estimation and Bayesian inference on a novel data set of 2D shapes in Fourier parameterization, which enables consistent visualization of samples for different dimensionalities. △ Less

Submitted 25 May, 2021; v1 submitted 25 May, 2019; originally announced May 2019.

Comments: Published at AAAI 2021

arXiv:1901.07987 [pdf, other]

Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks

Authors: Gianluca Detommaso, Hanne Hoitzing, Tiangang Cui, Ardavan Alamir

Abstract: Bayesian online changepoint detection (BOCPD) (Adams & MacKay, 2007) offers a rigorous and viable way to identify changepoints in complex systems. In this work, we introduce a Stein variational online changepoint detection (SVOCD) method to provide a computationally tractable generalization of BOCPD beyond the exponential family of probability distributions. We integrate the recently developed Ste… ▽ More Bayesian online changepoint detection (BOCPD) (Adams & MacKay, 2007) offers a rigorous and viable way to identify changepoints in complex systems. In this work, we introduce a Stein variational online changepoint detection (SVOCD) method to provide a computationally tractable generalization of BOCPD beyond the exponential family of probability distributions. We integrate the recently developed Stein variational Newton (SVN) method (Detommaso et al., 2018) and BOCPD to offer a full online Bayesian treatment for a large number of situations with significant importance in practice. We apply the resulting method to two challenging and novel applications: Hawkes processes and long short-term memory (LSTM) neural networks. In both cases, we successfully demonstrate the efficacy of our method on real data. △ Less

Submitted 25 May, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: 14 pages, 6 figures

arXiv:1806.03085 [pdf, other]

A Stein variational Newton method

Authors: Gianluca Detommaso, Tiangang Cui, Alessio Spantini, Youssef Marzouk, Robert Scheichl

Abstract: Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD… ▽ More Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases. △ Less

Submitted 29 October, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: 18 pages, 7 figures

Journal ref: NIPS 2018

arXiv:1802.07539 [pdf, other]

Continuous Level Monte Carlo and Sample-Adaptive Model Hierarchies

Authors: Gianluca Detommaso, Tim Dodwell, Rob Scheichl

Abstract: In this paper, we present a generalisation of the Multilevel Monte Carlo (MLMC) method to a setting where the level parameter is a continuous variable. This Continuous Level Monte Carlo (CLMC) estimator provides a natural framework in PDE applications to adapt the model hierarchy to each sample. In addition, it can be made unbiased with respect to the expected value of the true quantity of interes… ▽ More In this paper, we present a generalisation of the Multilevel Monte Carlo (MLMC) method to a setting where the level parameter is a continuous variable. This Continuous Level Monte Carlo (CLMC) estimator provides a natural framework in PDE applications to adapt the model hierarchy to each sample. In addition, it can be made unbiased with respect to the expected value of the true quantity of interest provided the quantity of interest converges sufficiently fast. The practical implementation of the CLMC estimator is based on interpolating actual evaluations of the quantity of interest at a finite number of resolutions. As our new level parameter, we use the logarithm of a goal-oriented finite element error estimator for the accuracy of the quantity of interest. We prove the unbiasedness, as well as a complexity theorem that shows the same rate of complexity for CLMC as for MLMC. Finally, we provide some numerical evidence to support our theoretical results, by successfully testing CLMC on a standard PDE test problem. The numerical experiments demonstrate clear gains for sample-wise adaptive refinement strategies over uniform refinements. △ Less

Submitted 21 February, 2018; originally announced February 2018.

Comments: 22 pages, 4 figures

Showing 1–9 of 9 results for author: Detommaso, G