-
Building Trees for Probabilistic Prediction via Scoring Rules
Authors:
Sara Shashaani,
Ozge Surer,
Matthew Plumlee,
Seth Guikema
Abstract:
Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study modifying a tree to produce nonparametric predictive distributions. We find the standard method for building trees may not result in good predictive distributi…
▽ More
Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study modifying a tree to produce nonparametric predictive distributions. We find the standard method for building trees may not result in good predictive distributions and propose changing the splitting criteria for trees to one based on proper scoring rules. Analysis of both simulated data and several real datasets demonstrates that using these new splitting criteria results in trees with improved predictive properties considering the entire predictive distribution.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Measuring sampling plan utility in post-market surveillance of medical product supply chains
Authors:
Eugene Wickett,
Matthew Plumlee,
Karen Smilowitz,
Souly Phanouvong,
Timothy Nwogu
Abstract:
Ensuring product quality is critical to combating the global challenge of substandard and falsified medical products. Post-market surveillance is a quality-assurance activity in which products are tested at consumer-facing locations. Regulators in low-resource settings use post-market surveillance to evaluate product quality across locations and determine corrective actions. Part of post-market su…
▽ More
Ensuring product quality is critical to combating the global challenge of substandard and falsified medical products. Post-market surveillance is a quality-assurance activity in which products are tested at consumer-facing locations. Regulators in low-resource settings use post-market surveillance to evaluate product quality across locations and determine corrective actions. Part of post-market surveillance is develo** a sampling plan, which specifies the number of tests to conduct at each location. With limited resources, it is important to base decisions for test allocations on the utility of the samples tested. We propose a Bayesian approach to generate a comprehensive utility for sampling plans. This sampling plan utility integrates regulatory risk assessments with prior testing data, available supply-chain information, and valuations of regulatory objectives. We illustrate the value of this metric with a case study based on de-identified post-market surveillance data from a low-resource setting. To do so, we develop a fast method for calculating utility that is used in a greedy heuristic to form sampling plans. The resulting plans focus surveillance on locations more likely to impact regulator decisions. These locations can be distinct from locations with higher assessed risk.
△ Less
Submitted 20 December, 2023; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Sequential Bayesian experimental design for calibration of expensive simulation models
Authors:
Özge Sürer,
Matthew Plumlee,
Stefan M. Wild
Abstract:
Simulation models of critical systems often have parameters that need to be calibrated using observed data. For expensive simulation models, calibration is done using an emulator of the simulation model built on simulation output at different parameter settings. Using intelligent and adaptive selection of parameters to build the emulator can drastically improve the efficiency of the calibration pr…
▽ More
Simulation models of critical systems often have parameters that need to be calibrated using observed data. For expensive simulation models, calibration is done using an emulator of the simulation model built on simulation output at different parameter settings. Using intelligent and adaptive selection of parameters to build the emulator can drastically improve the efficiency of the calibration process. The article proposes a sequential framework with a novel criterion for parameter selection that targets learning the posterior density of the parameters. The emergent behavior from this criterion is that exploration happens by selecting parameters in uncertain posterior regions while simultaneously exploitation happens by selecting parameters in regions of high posterior density. The advantages of the proposed method are illustrated using several simulation experiments and a nuclear physics reaction model.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Constructing a simulation surrogate with partially observed output
Authors:
Moses Y-H. Chan,
Matthew Plumlee,
Stefan M. Wild
Abstract:
Gaussian process surrogates are a popular alternative to directly using computationally expensive simulation models. When the simulation output consists of many responses, dimension-reduction techniques are often employed to construct these surrogates. However, surrogate methods with dimension reduction generally rely on complete output training data. This article proposes a new Gaussian process s…
▽ More
Gaussian process surrogates are a popular alternative to directly using computationally expensive simulation models. When the simulation output consists of many responses, dimension-reduction techniques are often employed to construct these surrogates. However, surrogate methods with dimension reduction generally rely on complete output training data. This article proposes a new Gaussian process surrogate method that permits the use of partially observed output while remaining computationally efficient. The new method involves the imputation of missing values and the adjustment of the covariance matrix used for Gaussian process inference. The resulting surrogate represents the available responses, disregards the missing responses, and provides meaningful uncertainty quantification. The proposed approach is shown to offer sharper inference than alternatives in a simulation study and a case study where an energy density functional model that frequently returns incomplete output is calibrated.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Bayesian calibration of viscous anisotropic hydrodynamic simulations of heavy-ion collisions
Authors:
Dananjaya Liyanage,
Özge Sürer,
Matthew Plumlee,
Stefan M. Wild,
Ulrich Heinz
Abstract:
Due to large pressure gradients at early times, standard hydrodynamic model simulations of relativistic heavy-ion collisions do not become reliable until $O(1)$\,fm/$c$ after the collision. To address this one often introduces a pre-hydrodynamic stage that models the early evolution microscopically, typically as a conformal, weakly interacting gas. In such an approach the transition from the pre-h…
▽ More
Due to large pressure gradients at early times, standard hydrodynamic model simulations of relativistic heavy-ion collisions do not become reliable until $O(1)$\,fm/$c$ after the collision. To address this one often introduces a pre-hydrodynamic stage that models the early evolution microscopically, typically as a conformal, weakly interacting gas. In such an approach the transition from the pre-hydrodynamic to the hydrodynamic stage is discontinuous, introducing considerable theoretical model ambiguity. Alternatively, fluids with large anisotropic pressure gradients can be handled macroscopically using the recently developed Viscous Anisotropic Hydrodynamics (VAH). In high-energy heavy-ion collisions VAH is applicable already at very early times, and at later times transitions smoothly into conventional second-order viscous hydrodynamics (VH). We present a Bayesian calibration of the VAH model with experimental data for Pb--Pb collisions at the LHC at $\sqrt{s_\textrm{NN}}=2.76$\,TeV. We find that the VAH model has the unique capability of constraining the specific viscosities of the QGP at higher temperatures than other previously used models.
△ Less
Submitted 9 March, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Inferring sources of substandard and falsified products in pharmaceutical supply chains
Authors:
Eugene Wickett,
Matthew Plumlee,
Karen Smilowitz,
Souly Phanouvong,
Victor Pribluda
Abstract:
Substandard and falsified pharmaceuticals, prevalent in low- and middle-income countries, substantially increase levels of morbidity, mortality and drug resistance. Regulatory agencies combat this problem using post-market surveillance by collecting and testing samples where consumers purchase products. Existing analysis tools for post-market surveillance data focus attention on the locations of p…
▽ More
Substandard and falsified pharmaceuticals, prevalent in low- and middle-income countries, substantially increase levels of morbidity, mortality and drug resistance. Regulatory agencies combat this problem using post-market surveillance by collecting and testing samples where consumers purchase products. Existing analysis tools for post-market surveillance data focus attention on the locations of positive samples. This paper looks to expand such analysis through underutilized supply-chain information to provide inference on sources of substandard and falsified products. We first establish the presence of unidentifiability issues when integrating this supply-chain information with surveillance data. We then develop a Bayesian methodology for evaluating substandard and falsified sources that extracts utility from supply-chain information and mitigates unidentifiability while accounting for multiple sources of uncertainty. Using de-identified surveillance data, we show the proposed methodology to be effective in providing valuable inference.
△ Less
Submitted 1 February, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Towards Precise and Accurate Calculations of Neutrinoless Double-Beta Decay: Project Sco** Workshop Report
Authors:
V. Cirigliano,
Z. Davoudi,
J. Engel,
R. J. Furnstahl,
G. Hagen,
U. Heinz,
H. Hergert,
M. Horoi,
C. W. Johnson,
A. Lovato,
E. Mereghetti,
W. Nazarewicz,
A. Nicholson,
T. Papenbrock,
S. Pastore,
M. Plumlee,
D. R. Phillips,
P. E. Shanahan,
S. R. Stroberg,
F. Viens,
A. Walker-Loud,
K. A. Wendt,
S. M. Wild
Abstract:
We present the results of a National Science Foundation (NSF) Project Sco** Workshop, the purpose of which was to assess the current status of calculations for the nuclear matrix elements governing neutrinoless double-beta decay and determine if more work on them is required. After reviewing important recent progress in the application of effective field theory, lattice quantum chromodynamics, a…
▽ More
We present the results of a National Science Foundation (NSF) Project Sco** Workshop, the purpose of which was to assess the current status of calculations for the nuclear matrix elements governing neutrinoless double-beta decay and determine if more work on them is required. After reviewing important recent progress in the application of effective field theory, lattice quantum chromodynamics, and ab initio nuclear-structure theory to double-beta decay, we discuss the state of the art in nuclear-physics uncertainty quantification and then construct a road map for work in all these areas to fully complement the increasingly sensitive experiments in operation and under development. The road map contains specific projects in theoretical and computational physics as well as an uncertainty-quantification plan that employs Bayesian Model Mixing and an analysis of correlations between double-beta-decay rates and other observables. The goal of this program is a set of accurate and precise matrix elements, in all nuclei of interest to experimentalists, delivered together with carefully assessed uncertainties. Such calculations will allow crisp conclusions from the observation or non-observation of neutrinoless double-beta decay, no matter what new physics is at play.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Uncertainty Quantification in Breakup Reactions
Authors:
Özge Sürer,
Filomena M. Nunes,
Matthew Plumlee,
Stefan M. Wild
Abstract:
Breakup reactions are one of the favored probes to study loosely bound nuclei, particularly those in the limit of stability forming a halo. In order to interpret such breakup experiments, the continuum discretized coupled channel method is typically used. In this study, the first Bayesian analysis of a breakup reaction model is performed. We use a combination of statistical methods together with a…
▽ More
Breakup reactions are one of the favored probes to study loosely bound nuclei, particularly those in the limit of stability forming a halo. In order to interpret such breakup experiments, the continuum discretized coupled channel method is typically used. In this study, the first Bayesian analysis of a breakup reaction model is performed. We use a combination of statistical methods together with a three-body reaction model (the continuum discretized coupled channel method) to quantify the uncertainties on the breakup observables due to the parameters in the effective potential describing the loosely bound projectile of interest. The combination of tools we develop opens the path for a Bayesian analysis of not only breakup processes, but also a wide array of complex processes that require computationally intensive reaction models.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.
-
Get on the BAND Wagon: A Bayesian Framework for Quantifying Model Uncertainties in Nuclear Dynamics
Authors:
D. R. Phillips,
R. J. Furnstahl,
U. Heinz,
T. Maiti,
W. Nazarewicz,
F. M. Nunes,
M. Plumlee,
M. T. Pratola,
S. Pratt,
F. G. Viens,
S. M. Wild
Abstract:
We describe the Bayesian Analysis of Nuclear Dynamics (BAND) framework, a cyberinfrastructure that we are develo** which will unify the treatment of nuclear models, experimental data, and associated uncertainties. We overview the statistical principles and nuclear-physics contexts underlying the BAND toolset, with an emphasis on Bayesian methodology's ability to leverage insight from multiple mo…
▽ More
We describe the Bayesian Analysis of Nuclear Dynamics (BAND) framework, a cyberinfrastructure that we are develo** which will unify the treatment of nuclear models, experimental data, and associated uncertainties. We overview the statistical principles and nuclear-physics contexts underlying the BAND toolset, with an emphasis on Bayesian methodology's ability to leverage insight from multiple models. In order to facilitate understanding of these tools we provide a simple and accessible example of the BAND framework's application. Four case studies are presented to highlight how elements of the framework will enable progress on complex, far-ranging problems in nuclear physics. By collecting notation and terminology, providing illustrative examples, and giving an overview of the associated techniques, this paper aims to open paths through which the nuclear physics and statistics communities can contribute to and build upon the BAND framework.
△ Less
Submitted 21 May, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments
Authors:
Chih-Li Sung,
Wenjia Wang,
Matthew Plumlee,
Benjamin Haaland
Abstract:
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative.…
▽ More
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input non-linear regression problems. An overlap** group lasso approach is used for estimation, ensuring computational feasibility in a large-scale and many-input setting. New results on consistency and inference for the (potentially overlap**) group lasso in a high-dimensional setting are developed and applied to the proposed multi-resolution functional ANOVA model. Importantly, these results allow us to quantify the uncertainty in our predictions. Numerical examples demonstrate that the proposed model enjoys marked computational advantages. Data capabilities, both in terms of sample size and dimension, meet or exceed best available emulation tools while meeting or exceeding emulation accuracy.
△ Less
Submitted 8 January, 2019; v1 submitted 20 September, 2017;
originally announced September 2017.
-
An Uncertainty Quantification Method for Inexact Simulation Models
Authors:
Matthew Plumlee,
Henry Lam
Abstract:
The vast majority of stochastic simulation models are imperfect in that they fail to exactly emulate real system dynamics. The inexactness of the simulation model, or model discrepancy, can impact the predictive accuracy and usefulness of the simulation for decision-making. This paper proposes a systematic framework to integrate data from both the simulation responses and the real system responses…
▽ More
The vast majority of stochastic simulation models are imperfect in that they fail to exactly emulate real system dynamics. The inexactness of the simulation model, or model discrepancy, can impact the predictive accuracy and usefulness of the simulation for decision-making. This paper proposes a systematic framework to integrate data from both the simulation responses and the real system responses to learn this discrepancy and quantify the resulting uncertainty. Our framework addresses the theoretical and computational requirements for stochastic estimation in a Bayesian setting. It involves an optimization-based procedure to compute confidence bounds on the target outputs that elicit desirable large-sample statistical properties. We illustrate the practical value of our framework with a call center example and a manufacturing line case study.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
Orthogonal Gaussian process models
Authors:
Matthew Plumlee,
V. Roshan Joseph
Abstract:
Gaussian processes models are widely adopted for nonparameteric/semi-parametric modeling. Identifiability issues occur when the mean model contains polynomials with unknown coefficients. Though resulting prediction is unaffected, this leads to poor estimation of the coefficients in the mean model, and thus the estimated mean model loses interpretability. This paper introduces a new Gaussian proces…
▽ More
Gaussian processes models are widely adopted for nonparameteric/semi-parametric modeling. Identifiability issues occur when the mean model contains polynomials with unknown coefficients. Though resulting prediction is unaffected, this leads to poor estimation of the coefficients in the mean model, and thus the estimated mean model loses interpretability. This paper introduces a new Gaussian process model whose stochastic part is orthogonal to the mean part to address this issue. This paper also discusses applications to multi-fidelity simulations using data examples.
△ Less
Submitted 1 November, 2016;
originally announced November 2016.
-
Fast prediction of deterministic functions using sparse grid experimental designs
Authors:
Matthew Plumlee
Abstract:
Random field models have been widely employed to develop a predictor of an expensive function based on observations from an experiment. The traditional framework for develo** a predictor with random field models can fail due to the computational burden it requires. This problem is often seen in cases where the input of the expensive function is high dimensional. While many previous works have fo…
▽ More
Random field models have been widely employed to develop a predictor of an expensive function based on observations from an experiment. The traditional framework for develo** a predictor with random field models can fail due to the computational burden it requires. This problem is often seen in cases where the input of the expensive function is high dimensional. While many previous works have focused on develo** an approximative predictor to resolve these issues, this article investigates a different solution mechanism. We demonstrate that when a general set of designs is employed, the resulting predictor is quick to compute and has reasonable accuracy. The fast computation of the predictor is made possible through an algorithm proposed by this work. This paper also demonstrates methods to quickly evaluate the likelihood of the observations and describes some fast maximum likelihood estimates for unknown parameters of the random field. The computational savings can be several orders of magnitude when the input is located in a high dimensional space. Beyond the fast computation of the predictor, existing research has demonstrated that a subset of these designs generate predictors that are asymptotically efficient. This work details some empirical comparisons to the more common space-filling designs that verify the designs are competitive in terms of resulting prediction accuracy.
△ Less
Submitted 3 December, 2014; v1 submitted 25 February, 2014;
originally announced February 2014.