-
Empirical Evidence That There Is No Such Thing As A Validated Prediction Model
Authors:
Florian D. van Leeuwen,
Ewout W. Steyerberg,
David van Klaveren,
Ben Wessler,
David M. Kent,
Erik W. van Zwet
Abstract:
Background: External validations are essential to assess clinical prediction models (CPMs) before deployment. Apart from model misspecification, differences in patient population and other factors influence a model's AUC (c-statistic). We aimed to quantify variation in AUCs across external validation studies and adjust expectations of a model's performance in a new setting.
Methods: The Tufts-PA…
▽ More
Background: External validations are essential to assess clinical prediction models (CPMs) before deployment. Apart from model misspecification, differences in patient population and other factors influence a model's AUC (c-statistic). We aimed to quantify variation in AUCs across external validation studies and adjust expectations of a model's performance in a new setting.
Methods: The Tufts-PACE CPM Registry contains CPMs for cardiovascular disease prognosis. We analyzed the AUCs of 469 CPMs with a total of 1,603 external validations. For each CPM, we performed a random effects meta-analysis to estimate the between-study standard deviation $τ$ among the AUCs. Since the majority of these meta-analyses has only a handful of validations, this leads to very poor estimates of $τ$. So, we estimated a log normal distribution of $τ$ across all CPMs and used this as an empirical prior. We compared this empirical Bayesian approach with frequentist meta-analyses using cross-validation.
Results: The 469 CPMs had a median of 2 external validations (IQR: [1-3]). The estimated distribution of $τ$ had a mean of 0.055 and a standard deviation of 0.015. If $τ$ = 0.05, the 95% prediction interval for the AUC in a new setting is at least +/- 0.1, regardless of the number of validations. Frequentist methods underestimate the uncertainty about the AUC in a new setting. Accounting for $τ$ in a Bayesian approach achieved near nominal coverage.
Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Calibration of Phone Likelihoods in Automatic Speech Recognition
Authors:
David A. van Leeuwen,
Joost van Doremalen
Abstract:
In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi's DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated, b…
▽ More
In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi's DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated, because the DNN is trained using cross entropy as the objective function, which is a proper scoring rule. When entire phones are assessed, we observe that it is best to average the log likelihoods over the duration of the phone. Further scaling of the average log likelihoods by the logarithm of the duration slightly improves the calibration, and this improvement is retained when tested on independent test data.
△ Less
Submitted 14 June, 2016;
originally announced June 2016.
-
The "Sprekend Nederland" project and its application to accent location
Authors:
David A. van Leeuwen,
Rosemary Orr
Abstract:
This paper describes the data collection effort that is part of the project Sprekend Nederland (The Netherlands Talking), and discusses its potential use in Automatic Accent Location. We define Automatic Accent Location as the task to describe the accent of a speaker in terms of the location of the speaker and its history. We discuss possible ways of describing accent location, the consequence the…
▽ More
This paper describes the data collection effort that is part of the project Sprekend Nederland (The Netherlands Talking), and discusses its potential use in Automatic Accent Location. We define Automatic Accent Location as the task to describe the accent of a speaker in terms of the location of the speaker and its history. We discuss possible ways of describing accent location, the consequence these have for the task of automatic accent location, and potential evaluation metrics.
△ Less
Submitted 8 April, 2016; v1 submitted 8 February, 2016;
originally announced February 2016.
-
Constrained speaker linking
Authors:
David A. van Leeuwen,
Niko Brümmer
Abstract:
In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlap** speakers. The surprisingly common case where speakers in telephone conversations are kno…
▽ More
In this paper we study speaker linking (a.k.a.\ partitioning) given constraints of the distribution of speaker identities over speech recordings. Specifically, we show that the intractable partitioning problem becomes tractable when the constraints pre-partition the data in smaller cliques with non-overlap** speakers. The surprisingly common case where speakers in telephone conversations are known, but the assignment of channels to identities is unspecified, is treated in a Bayesian way. We show that for the Dutch CGN database, where this channel assignment task is at hand, a lightweight speaker recognition system can quite effectively solve the channel assignment problem, with 93% of the cliques solved. We further show that the posterior distribution over channel assignment configurations is well calibrated.
△ Less
Submitted 2 April, 2014; v1 submitted 26 March, 2014;
originally announced March 2014.
-
A comparison of linear and non-linear calibrations for speaker recognition
Authors:
Niko Brümmer,
Albert Swart,
David van Leeuwen
Abstract:
In recent work on both generative and discriminative score to log-likelihood-ratio calibration, it was shown that linear transforms give good accuracy only for a limited range of operating points. Moreover, these methods required tailoring of the calibration training objective functions in order to target the desired region of best accuracy. Here, we generalize the linear recipes to non-linear one…
▽ More
In recent work on both generative and discriminative score to log-likelihood-ratio calibration, it was shown that linear transforms give good accuracy only for a limited range of operating points. Moreover, these methods required tailoring of the calibration training objective functions in order to target the desired region of best accuracy. Here, we generalize the linear recipes to non-linear ones. We experiment with a non-linear, non-parametric, discriminative PAV solution, as well as parametric, generative, maximum-likelihood solutions that use Gaussian, Student's T and normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores suggest that the non-linear methods provide wider ranges of optimal accuracy and can be trained without having to resort to objective function tailoring.
△ Less
Submitted 9 April, 2014; v1 submitted 11 February, 2014;
originally announced February 2014.
-
The distribution of calibrated likelihood-ratios in speaker recognition
Authors:
David A. van Leeuwen,
Niko Brümmer
Abstract:
This paper studies properties of the score distributions of calibrated log-likelihood-ratios that are used in automatic speaker recognition. We derive the essential condition for calibration that the log likelihood ratio of the log-likelihood-ratio is the log-likelihood-ratio. We then investigate what the consequence of this condition is to the probability density functions (PDFs) of the log-likel…
▽ More
This paper studies properties of the score distributions of calibrated log-likelihood-ratios that are used in automatic speaker recognition. We derive the essential condition for calibration that the log likelihood ratio of the log-likelihood-ratio is the log-likelihood-ratio. We then investigate what the consequence of this condition is to the probability density functions (PDFs) of the log-likelihood-ratio score. We show that if the PDF of the non-target distribution is Gaussian, then the PDF of the target distribution must be Gaussian as well. The means and variances of these two PDFs are interrelated, and determined completely by the discrimination performance of the recognizer characterized by the equal error rate. These relations allow for a new way of computing the offset and scaling parameters for linear calibration, and we derive closed-form expressions for these and show that for modern i-vector systems with PLDA scoring this leads to good calibration, comparable to traditional logistic regression, over a wide range of system performance.
△ Less
Submitted 8 June, 2013; v1 submitted 3 April, 2013;
originally announced April 2013.