Search | arXiv e-print repository

Robust Distributed Learning of Functional Data From Simulators through Data Sketching

Authors: R. Jacob Andros, Rajarshi Guhaniyogi, Devin Francom, Donatella Pasqualini

Abstract: In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that parti… ▽ More In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that partition data into shards for distributed computations. However, concerns about the sensitivity of distributed inference to shard selection arise. Instead of using data shards, our approach employs multiple random matrices to create random linear projections, or sketches, of the dataset. Posterior inference on functional data models is conducted using random data sketches on various machines in parallel. These individual inferences are combined across machines at a central server. The aggregation of inference across random matrices makes our approach resilient to the selection of data sketches, resulting in robust distributed Bayesian learning. An important advantage is its ability to maintain the privacy of sampling units, as random sketches prevent the recovery of raw data. We highlight the significance of our approach through simulation examples and showcase the performance of our approach as an emulator using surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator - an important simulator for government agencies. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.07555 [pdf, other]

Sequential Monte Carlo for Cut-Bayesian Posterior Computation

Authors: Joseph Mathews, Giri Gopalan, James Gattiker, Sean Smith, Devin Francom

Abstract: We propose a sequential Monte Carlo (SMC) method to efficiently and accurately compute cut-Bayesian posterior quantities of interest, variations of standard Bayesian approaches constructed primarily to account for model misspecification. We prove finite sample concentration bounds for estimators derived from the proposed method along with a linear tempering extension and apply these results to a r… ▽ More We propose a sequential Monte Carlo (SMC) method to efficiently and accurately compute cut-Bayesian posterior quantities of interest, variations of standard Bayesian approaches constructed primarily to account for model misspecification. We prove finite sample concentration bounds for estimators derived from the proposed method along with a linear tempering extension and apply these results to a realistic setting where a computer model is misspecified. We then illustrate the SMC method for inference in a modular chemical reactor example that includes submodels for reaction kinetics, turbulence, mass transfer, and diffusion. The samples obtained are commensurate with a direct-sampling approach that consists of running multiple Markov chains, with computational efficiency gains using the SMC method. Overall, the SMC method presented yields a novel, rigorous approach to computing with cut-Bayesian posterior distributions. △ Less

Submitted 8 March, 2024; originally announced June 2024.

Report number: LA-UR-23-31546

arXiv:2307.11241 [pdf, other]

Discovering Active Subspaces for High-Dimensional Computer Models

Authors: Kellin N. Rumsey, Devin Francom, Scott Vander Wiel

Abstract: Dimension reduction techniques have long been an important topic in statistics, and active subspaces (AS) have received much attention this past decade in the computer experiments literature. The most common approach towards estimating the AS is to use Monte Carlo with numerical gradient evaluation. While sensible in some settings, this approach has obvious drawbacks. Recent research has demonstra… ▽ More Dimension reduction techniques have long been an important topic in statistics, and active subspaces (AS) have received much attention this past decade in the computer experiments literature. The most common approach towards estimating the AS is to use Monte Carlo with numerical gradient evaluation. While sensible in some settings, this approach has obvious drawbacks. Recent research has demonstrated that active subspace calculations can be obtained in closed form, conditional on a Gaussian process (GP) surrogate, which can be limiting in high-dimensional settings for computational reasons. In this paper, we produce the relevant calculations for a more general case when the model of interest is a linear combination of tensor products. These general equations can be applied to the GP, recovering previous results as a special case, or applied to the models constructed by other regression techniques including multivariate adaptive regression splines (MARS). Using a MARS surrogate has many advantages including improved scaling, better estimation of active subspaces in high dimensions and the ability to handle a large number of prior distributions in closed form. In one real-world example, we obtain the active subspace of a radiation-transport code with 240 inputs and 9,372 model runs in under half an hour. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.01911 [pdf, other]

Generalized Bayesian MARS: Tools for Emulating Stochastic Computer Models

Authors: Kellin Rumsey, Devin Francom, Andy Shen

Abstract: The multivariate adaptive regression spline (MARS) approach of Friedman (1991) and its Bayesian counterpart (Francom et al. 2018) are effective approaches for the emulation of computer models. The traditional assumption of Gaussian errors limits the usefulness of MARS, and many popular alternatives, when dealing with stochastic computer models. We propose a generalized Bayesian MARS (GBMARS) frame… ▽ More The multivariate adaptive regression spline (MARS) approach of Friedman (1991) and its Bayesian counterpart (Francom et al. 2018) are effective approaches for the emulation of computer models. The traditional assumption of Gaussian errors limits the usefulness of MARS, and many popular alternatives, when dealing with stochastic computer models. We propose a generalized Bayesian MARS (GBMARS) framework which admits the broad class of generalized hyperbolic distributions as the induced likelihood function. This allows us to develop tools for the emulation of stochastic simulators which are parsimonious, scalable, interpretable and require minimal tuning, while providing powerful predictive and uncertainty quantification capabilities. GBMARS is capable of robust regression with t distributions, quantile regression with asymmetric Laplace distributions and a general form of "Normal-Wald" regression in which the shape of the error distribution and the structure of the mean function are learned simultaneously. We demonstrate the effectiveness of GBMARS on various stochastic computer models and we show that it compares favorably to several popular alternatives. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.08834 [pdf, other]

Elastic Bayesian Model Calibration

Authors: Devin Francom, J. Derek Tucker, Gabriel Huerta, Kurtis Shuler, Daniel Ries

Abstract: Functional data are ubiquitous in scientific modeling. For instance, quantities of interest are modeled as functions of time, space, energy, density, etc. Uncertainty quantification methods for computer models with functional response have resulted in tools for emulation, sensitivity analysis, and calibration that are widely used. However, many of these tools do not perform well when the model's p… ▽ More Functional data are ubiquitous in scientific modeling. For instance, quantities of interest are modeled as functions of time, space, energy, density, etc. Uncertainty quantification methods for computer models with functional response have resulted in tools for emulation, sensitivity analysis, and calibration that are widely used. However, many of these tools do not perform well when the model's parameters control both the amplitude variation of the functional output and its alignment (or phase variation). This paper introduces a framework for Bayesian model calibration when the model responses are misaligned functional data. The approach generates two types of data out of the misaligned functional responses: one that isolates the amplitude variation and one that isolates the phase variation. These two types of data are created for the computer simulation data (both of which may be emulated) and the experimental data. The calibration approach uses both types so that it seeks to match both the amplitude and phase of the experimental data. The framework is careful to respect constraints that arise especially when modeling phase variation, but also in a way that it can be done with readily available calibration software. We demonstrate the techniques on a simulated data example and on two dynamic material science problems: a strength model calibration using flyer plate experiments and an equation of state model calibration using experiments performed on the Sandia National Laboratories' Z-machine. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 45 pages, 21 figures

arXiv:2210.09181 [pdf, ps, other]

Bayesian Projection Pursuit Regression

Authors: Gavin Collins, Devin Francom, Kellin Rumsey

Abstract: In projection pursuit regression (PPR), an unknown response function is approximated by the sum of M "ridge functions," which are flexible functions of one-dimensional projections of a multivariate input space. Traditionally, optimization routines are used to estimate the projection directions and ridge functions via a sequential algorithm, and M is typically chosen via cross-validation. We introd… ▽ More In projection pursuit regression (PPR), an unknown response function is approximated by the sum of M "ridge functions," which are flexible functions of one-dimensional projections of a multivariate input space. Traditionally, optimization routines are used to estimate the projection directions and ridge functions via a sequential algorithm, and M is typically chosen via cross-validation. We introduce the first Bayesian version of PPR, which has the benefit of accurate uncertainty quantification. To learn the projection directions and ridge functions, we apply novel adaptations of methods used for the single ridge function case (M=1), called the Single Index Model, for which Bayesian implementations do exist; then use reversible jump MCMC to learn the number of ridge functions M. We evaluate the predictive ability of our model in 20 simulation scenarios and for 23 real datasets, in a bake-off against an array of state-of-the-art regression methods. Its effective performance indicates that Bayesian Projection Pursuit Regression is a valuable addition to the existing regression toolbox. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: 30 pages, 14 figures, supplemental material

MSC Class: 62F15; 62G08

Showing 1–6 of 6 results for author: Francom, D