Robust Distributed Learning of Functional Data From Simulators through Data Sketching
Authors:
R. Jacob Andros,
Rajarshi Guhaniyogi,
Devin Francom,
Donatella Pasqualini
Abstract:
In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that parti…
▽ More
In environmental studies, realistic simulations are essential for understanding complex systems. Statistical emulation with Gaussian processes (GPs) in functional data models have become a standard tool for this purpose. Traditional centralized processing of such models requires substantial computational and storage resources, leading to emerging distributed Bayesian learning algorithms that partition data into shards for distributed computations. However, concerns about the sensitivity of distributed inference to shard selection arise. Instead of using data shards, our approach employs multiple random matrices to create random linear projections, or sketches, of the dataset. Posterior inference on functional data models is conducted using random data sketches on various machines in parallel. These individual inferences are combined across machines at a central server. The aggregation of inference across random matrices makes our approach resilient to the selection of data sketches, resulting in robust distributed Bayesian learning. An important advantage is its ability to maintain the privacy of sampling units, as random sketches prevent the recovery of raw data. We highlight the significance of our approach through simulation examples and showcase the performance of our approach as an emulator using surrogates of the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) simulator - an important simulator for government agencies.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
Comparison and Bayesian Estimation of Feature Allocations
Authors:
David B. Dahl,
Devin J. Johnson,
R. Jacob Andros
Abstract:
Feature allocation models postulate a sampling distribution whose parameters are derived from shared features. Bayesian models place a prior distribution on the feature allocation, and Markov chain Monte Carlo is typically used for model fitting, which results in thousands of feature allocations sampled from the posterior distribution. Based on these samples, we propose a method to provide a point…
▽ More
Feature allocation models postulate a sampling distribution whose parameters are derived from shared features. Bayesian models place a prior distribution on the feature allocation, and Markov chain Monte Carlo is typically used for model fitting, which results in thousands of feature allocations sampled from the posterior distribution. Based on these samples, we propose a method to provide a point estimate of a latent feature allocation. First, we introduce FARO loss, a function between feature allocations which satisfies quasi-metric properties and allows for comparing feature allocations with differing numbers of features. The loss involves finding the optimal feature ordering among all possible, but computational feasibility is achieved by framing this task as a linear assignment problem. We also introduce the FANGS algorithm to obtain a Bayes estimate by minimizing the Monte Carlo estimate of the posterior expected FARO loss using the available samples. FANGS can produce an estimate other than those visited in the Markov chain. We provide an investigation of existing methods and our proposed methods. Our loss function and search algorithm are implemented in the fangs package in R.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.