Bayesian calibration of stochastic agent based model via random forest

Connor Robertson
Sandia National Laboratories
Livermore, CA
[email protected]
&Cosmin Safta
Sandia National Laboratories
Livermore, CA
&Nicholson Collier
Argonne National Laboratory
Chicago, IL
&Jonathan Ozik
Argonne National Laboratory
Chicago, IL
&Jaideep Ray
Sandia National Laboratories
Livermore, CA

Abstract

Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID’s quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.

Keywords agent-based modeling, epidemiology, machine learning surrogate, Bayesian calibration, MCMC

1 Introduction

Agent-based models (ABMs) are powerful tools for simulating complex systems that have found use across diverse domains, from traffic flow and ecology to economics and epidemiology. These "bottom-up" computational frameworks represent systems as a collection of autonomous agents that interact with each other and with their environment. This decentralized, microscopic perspective allows ABMs to capture small scale or emergent phenomena that traditional “top-down,” or population level, approaches often miss.

ABMs find applications in a plethora of fields. In traffic flow, they have been used to identify transportation bottlenecks and explore conditions that reduce the efficiency of infrastructure [28, 5, 36, 51]. In ecology, they can accurately model the spread of invasive species and predict ecosystem tip** points [31, 32, 48]. In economics, they simulate market dynamics and assess the impact of policy interventions [13, 8, 54]. Despite their versatility, ABMs have a fundamental challenge: parameter calibration. While ABMs have been shown to effectively replicate historical data and trends, they often include a wide range of possible individual and environmental characteristics making calibrating ABMs a high-dimensional problem. Further, ABM simulations often scale poorly due to agent interactions, making each run of the model computationally expensive. To compound these challenges, ABM models are often inherently stochastic. Unlike deterministic models, ABMs incorporate randomness in agent behaviors and in environmental factors. Though this stochasticity is crucial for capturing real-world complexity, it introduces significant uncertainty, making precise calibration elusive and increasing the computational expense of calibration.

Various calibration techniques for ABMs have been proposed to align simulations with empirical data. These include Approximate Bayesian Computation (ABC) [15, 53, 50], variational inference [16, 35], Markov chain Monte Carlo (MCMC) [24], and evolutionary algorithms [41, 49]. These parameter estimation approaches are commonly combined with an emulator or surrogate model of the ABM such as Gaussian processes [14, 1, 19, 44], decision trees (or forests) [33, 2, 34, 43], or ordinary differential equations (ODEs) [7] to reduce computational cost. However, each of these existing calibration approaches face limitations, including their ability to address stochasticity in the underlying ABM, guarantees on their convergence, and their computational efficiency.

This paper proposes a novel approach to calibrating ABMs by introducing a random forest based global surrogate model which can connect the nonlinear dependence of population level outputs to the ABM parameters over long temporal stretches. This approach includes decomposing quantities of interest via principal component analysis (PCA) and using the built-in sensitivity measures of the random forest to reduce the dimensionality. This surrogate is combined with Bayesian sampling, in the form of MCMC, to produce approximate posterior distributions for the ABM parameters of interest in a fraction of the time it would take using repeated ABM evaluations. Rigorous validation metrics are used to quantify the success of the calibration and the resulting posterior distributions are sampled to produce an ABM generated “pushforward” comparison. Though generally applicable to any ABM calibrated to population-level observations, we will present the calibration approach in the context of the epidemiological ABM CityCOVID [44, 29], which was used to model the spread of COVID-19 in the greater Chicago area during 2020 and supported city and state public health decision making.

The remainder of the paper is structured as follows. Section 2 outlines the CityCOVID ABM, its use, and respective calibration and surrogate training data. Section 3 outlines the surrogate construction procedure and details the formulation of the calibration problem for this ABM. Sections 4 and 5 discuss the results of the calibration procedure for CityCOVID and conclude the paper with key findings, future directions, and the broader implications of our work for advancing ABM calibration.

1.1 Literature review

There are a variety of approaches which have been explored to calibrate epidemiological ABMs. Among these is the approach by Fadikar et al. [19] who used Gaussian process surrogates to model the mean evolution of an epidemiological ABM and its quantile evolution to capture stochasticity. Further, Anirudh et al. [4] presented a surrogate modeling approach which first decomposed data into temporal modes using PCA and then modeled the map** from ABM parameters to PCA weights using a neural network. In each of these cases, the parameter estimation was performed with either rejection ABC or MCMC.

Calibration without the use of surrogate accelerators has been attempted by way of genetic algorithms, which can identify successful parameter values in search spaces of moderate dimension but they do not provide uncertainty quantification [10]. This approach allows for global exploration of the parameter space of ABMs but requires specifying fitness functions which are often problem specific and do not have any convergence guarantees. Given the stochasticity of most ABMs and the often limited data used for calibration, having reliable approaches that include estimated uncertainty on the parameter calibration is a necessity.

In the last decade, work has focused on leveraging more advanced machine learning approaches to improve calibration or to accelerate calibration with improved surrogate models. For example, LightGBM gradient boosted forests of decision trees have been used to filter proposed parameters in ABC sampling [45]. Alternatively, causal structure has been prescribed to fit surrogates made up of systems of ordinary differential equations or recurrent neural networks and capture ABM outputs in a latent space [17]. As a summary of surrogate approaches, Angione et al. [3] provided a comparison of various common machine learning algorithms applied as surrogate models for an example ABM. In this, neural networks were shown to be the most effective at replicating ABM outputs, but random forests were identified as a close competitor and far more computationally efficient. Additionally, the nonlinearity and stochasticity of the map** from ABM parameters to outputs was cited as a key challenge and PCA decomposition identified as a potential aid for surrogate accuracy over timeseries outputs. Across previous work, the stochasticity of ABMs was incorporated into surrogate modeling by including the random seed as an input parameter, reducing the individual seeded runs to deterministic models [45, 17, 3].

The most recent work for ABM calibration has begun exploring the possibility of differentiable ABMs, which can automatically provide information for calibration. New frameworks for ABM construction built on computational tools for deep learning can provide gradients for optimization automatically at evaluation of ABMs [11]. Fundamentally, this new approach can be extended even through discrete randomness, a characteristic feature of most ABMs [6]. These new approaches hold the potential of accelerating the calibration procedure using Hamiltonian-based MCMC approaches, which require gradients of the posterior distributions with respect to the parameters. This opens the possibility of calibrating high dimensional systems while maintaining the convergence guarantees of MCMC approaches. However, incorporating these automatic gradient computations requires fundamental reconstruction of the software for ABM modeling. The approach presented in this paper instead takes a black-box perspective which requires no intrusive modification of the model.

Properly assessing the accuracy of stochastic model calibration is often fraught with nuance and delicacy. However, one effective approach is the use of strictly proper scoring metrics [22] which allow for unique maximizers while still providing the flexibility to match the problem at hand. Most common among these metrics is the continuous ranked probability score (CRPS [23]) which generalizes mean absolute error for predictive cumulative distribution functions where a CRPS of 0 is a perfect match of an ensemble of stochastic simulations and the observation and positive values represent mean absolute error between each ensemble run and the observation. This scoring technique has been successfully used to evaluate the efficacy of ABMs previously [55]. Further validation can be found by considering the verification rank histogram [26] (VRH) of the ABM outputs which can identify over or under-dispersion or equivalently tendencies to over- or under-predict compared to observational data.

2 CityCOVID

CityCOVID is an ABM developed during the COVID-19 pandemic to quantify the outcomes of behavioral and policy interventions in the Chicago, IL metropolitan area. Based on the epidemiological ABM framework ChiSIM [37] and the Repast HPC distributed ABM toolkit [12], CityCOVID includes an age-stratified population of 2.7 million agents which occupy 1.2 million distinct locations including households, workplaces, schools, nursing homes, hospitals, and gyms. The agents move between these locations according to a variety of schedules based on their demographics [42] and can be exposed to infection when present at a location with infected agents. Individual agents transition between epidemiological states: susceptible, exposed, presymptomatic, infected (asymptomatic), infected (symptomatic), hospitalized, hospitalized (ICU), recovered, and deceased. The transitions are governed by probabilities of exposure to infected, hospitalization, or death and the durations of agent infections and hospitalizations are Gamma distributed.

In order to capture the heterogeneity and complexity of epidemiological dynamics between individuals, agents and their interactions are governed by draws from parametrized probability distributions. This high-dimensional parametrization gives flexible control over the impacts of different policies and interventions but also yields a wide range of possible outcomes and introduces enormous stochasticity, making the model a challenge to calibrate.

Given this challenge and its dimensionality, full Bayesian calibration is computationally infeasible. However, previous work by Ozik et al.[44] used a sequential approximate Bayesian calibration (IMABC) approach to iteratively sample from a prior distribution which evolves over time[50]. At each iteration, this distribution is updated by comparing the averaged hospitalization and death trajectories of stochastic realizations of CityCOVID parameter combinations with the empirical data observed in Chicago during 2020. This approach requires a large number of runs but is more efficient than comparable rejection ABC methods. Its efficiency was further improved by performing a Morris global sensitivity analysis [40], which allowed for a reduction to only 9 CityCOVID parameters that strongly influence the output hospitalization and death trajectories of the model. These parameters are listed in Table 1.

Even with parameter reduction and IMABC optimization, the model calibration required just over 32,000 runs which amounted to a total of 420,000 core hours on the Argonne Leadership Computing Facility Theta supercomputer. Relevant figures and details on CityCOVID and its calibration can be found in Ozik et al.[44] and specifics on the IMABC algorithm in Rutter et al.[50]. The mean outcomes of the calibrated model were a good match for the data and the predictive accuracy of these outcomes was sufficient to use as forecasting support for city and state public health response. This paper furthers this work by targeting a reduction in the computational burden of the calibration while also reducing the uncertainty in the posterior estimations.

Rates

* Exposure to infected

Probabilities

* Stay at home

Protective behaviors

Proportions

Isolating in home

Isolating in nursing home

Multipliers

Seasonality

Other

* Time of initial seeding of infections

Number of initially infected

Shielding by other susceptible

Table 1: The most influential parameters on the census outputs of hospitalizations and deaths in the CityCOVID ABM based on the Morris global sensitivity analysis. Parameters marked with stars represent those most influential to the surrogate model.

2.1 Data

In this section we describe the observational data used in model calibration as well as the process of generating data suitable for use in training a surrogate model to approximate CityCOVID.

Observational data: The daily census numbers of occupied hospitalization beds and cumulative deaths caused by COVID-19 were collected by the Illinois National Electronic Disease Surveillance System in Chicago from March to June of 2020. Due to the lack of COVID-19 testing capability early in 2020, case counts of COVID-19 were unreliable for use with calibration.

Surrogate training dataset: In order to accurately reproduce CityCOVID results with a surrogate model, a representative sample of the model outputs in a parameter range of interest was needed. The IMABC calibration effort (see §2) provided a 9 dimensional parameter space on which the model generates realistic rates of hospitalizations and deaths. By first training a surrogate model (described in § 3.1) on these previous simulations, feature importance could be recalculated to narrow the parameter space down to the top 4 parameters using the sensitivity metrics of the surrogate (random forest [30]). The sensitivity of the parameters for the surrogate are listed in Table 6 and the 4 retained parameters ( $\vec{\theta}$ ) are marked with stars in Table 1. The prior beliefs for these parameters were chosen to be non-informative within a region empirically chosen by comparison of the IMABC calibration outputs and the observational dala and are shown in Table 2.

Parameter	Description	Prior Belief
$\theta_{1}$	Rate of exposure to infected	$\mathcal{U}(0.046,0.069)$
$\theta_{2}$	Time of initial seeding of infections	$\mathcal{U}(31,59)$
$\theta_{3}$	Probability of stay at home	$\mathcal{U}(0.939,0.981)$
$\theta_{4}$	Probability of protective behaviors	$\mathcal{U}(0.407,0.492)$

Table 2: Prior beliefs for 4 retained parameters for surrogate training and calibration. The range of the uniform distributions was taken from preliminary runs of CityCOVID [44].

To give the surrogate adequate training information in these prior domains, 700 quasi-random samples of $\vec{\theta}$ were taken using Halton sampling[58] in the 4 dimensional space. Each of the 700 parameter sets were simulated with 50 different random seeds, allowing for a robust characterization of stochasticity to be represented in the outcome. These seeds alter the selection of who is initially infected in the simulation, random draws related to infectious state duration and transmission, and in location movements within agent schedules. The resulting hospitalization and death projections for each simulation were then averaged across random seeds to produce a “mean-model” estimate of the parameter outputs.

This approach does not fully account for the stochasticity of the ABM but reflects the most common approach used for ABM-based forecasting. Comparisons of the complete dataset with the observed hospitalizations ( $\hat{h}$ ) and deaths ( $\hat{d}$ ) are shown in Figure 1. Note that although the hospitalization and death information in CityCOVID is tied to specific agents and thus spatially distributed, here we only consider comparisons of the Chicago city-wide census quantities.

Refer to caption — Figure 1: Range of hospitalization and death trajectories for the observed data in Chicago from March-June of 2020 (black) and the CityCOVID simulations using parameter values from the 4 dimensional quasi-random hypercube used to train and test the surrogate method (blue). CityCOVID outputs are averaged across random seeds.

3 Methods

3.1 Surrogate model

Prior work with ABMs has demonstrated the highly nonlinear nature of their dynamics [38] and as such, reductions which characterize the temporal behavior into smooth modes have been effective [19, 4]. Following these approaches, this surrogate approach begins by decomposing the temporally concatenated hospitalization and death series using PCA:

\displaystyle\begin{bmatrix}h_{1,1}&\ldots&h_{1,n}&d_{1,1}&\ldots&d_{1,n}\\ h_{2,1}&\ldots&h_{2,n}&d_{2,1}&\ldots&d_{2,n}\\ \vdots&\ldots&\vdots&\vdots&\ldots&\vdots\\ h_{m,1}&\ldots&h_{m,n}&d_{m,1}&\ldots&d_{m,n}\end{bmatrix}\rightarrow\begin{% bmatrix}\alpha_{1,1}\\ \alpha_{2,1}\\ \vdots\\ \alpha_{m,1}\end{bmatrix}\odot\vec{c}_{1}+\ldots+\begin{bmatrix}\alpha_{1,2n}% \\ \alpha_{2,2n}\\ \vdots\\ \alpha_{m,2n}\end{bmatrix}\odot\vec{c}_{2n}

(1)

where $h_{i,j},d_{i,j}$ are hospitalizations and deaths for parameter set $i$ at time step $j$ , $\vec{c}_{j}\in\mathbb{R}^{2n}$ are the PCA components, and $\alpha_{i,j}$ are the coefficients multiplying components $\vec{c}_{j}$ for parameter set $i$ . To reduce the dimensionality of the surrogate map** from ABM parameters to component coefficients, the PCA decomposition is truncated to only the most dominant components $\vec{c}_{j}$ in order to capture 95% of the temporal variance.

Given this decomposition, a regressor $R^{s}_{\vec{\gamma}}$ is trained to map the Morris sensitivity-reduced CityCOVID ABM parameter set (9 parameters) to the output coefficients $\vec{\alpha}_{i}=\{\alpha_{i,j}\}$ for all time steps $j=1,\ldots,n$ with regressor hyperparameters $\vec{\gamma}$ . Although a wide range of classical or machine learning surrogate approaches could be used for regressor $R^{s}_{\vec{\gamma}}$ [4], random forests were selected due to their computational efficiency and because their structure naturally capture both nonlinear and discontinuous behaviors, both of which are characteristic in ABMs [38].

To improve the computational efficiency of the surrogate-based calibration, the dimensionality of the ABM parameter set was reduced by analyzing the sensitivity of the random forest with Gini impurity[27], permutation importance[27], and Sobol indices[52]. These sensitivity metrics measure how important each input parameter is to: the structure of the trees (Gini), the scale of the outputs (permutation), and the variance of the output (Sobol). They are discussed in detail in Appendix A and the exact sensitivity values are listed in Table 3. As can be observed, the rate of exposure to infected, i.e., the hourly probability of getting exposed from an infected individual, and the time of initial seeding of infections play an outsized role on the hospitalization and death trajectories, according to the random forest.

Feature	Gini Importance	Permutation Importance	Sobol (first)	Sobol (total)
Rate of exposure to infected	0.41	0.59	0.52	0.56
Time of initial seeding of infections	0.32	0.61	0.29	0.33
Probability of stay at home	0.18	0.29	0.08	0.09
Probability of protective behaviors	0.08	0.05	0.01	0.01

Table 3: Random forest feature importance metrics for CityCOVID parameters

After reducing the input dimensionality, a final random forest $R_{\vec{\gamma}}$ was trained to map only the surrogate sensitivity-reduced CityCOVID ABM parameter set $\vec{\theta}_{i}=\{\theta_{1},\theta_{2},\theta_{3},\theta_{4}\}_{i}$ to the output coefficients $\vec{\alpha}_{i}=\{\alpha_{i,j}\}$ . This random forest $R_{\vec{\gamma}}$ has several hyperparameters $\vec{\gamma}$ which can dramatically affect its performance including the number of trees, the criterion used to split trees, and constraints on the number of samples needed for splits and leaves. In order to maximize accuracy, the hyperparameters were tuned via 5-fold cross validation brute-force search. Given the optimally selected hyperparameters, the random forest is trained with the full hypercube of surrogate sensitivity-reduced data.

3.2 Formulation of the estimation problem

Bayesian methods are a desirable approach for calibration due to their control over posterior form and convergence guarantees but are only practical for low dimensional calibrations. Our problem easily fit this requirement after reduction of the parameter space using surrogate sensitivity.

Though the surrogate model was created to estimate the census trajectories of hospitalizations and deaths, the non-stationarity of these features is prohibitive for convergence of Bayesian sampling via MCMC. This is because the census values of hospitalizations and deaths are of different scales and fluctuate over large ranges, the likelihood calculation can be biased toward approximating death curves for later dates. As a result, calibration is performed using rolling averages over 1-week windows for the daily counts of hospitalizations ( $\vec{h}^{\circ}$ ) and deaths ( $\vec{d}^{\circ}$ ) which are computed via finite differences. The surrogate outputs were also translated into daily counts by forward finite differences.

Let $(\vec{h},\vec{d})=\mathcal{M}(\vec{\theta})$ be the daily predictions of the CityCOVID model, conditional on input parameters $\vec{\theta}=\{\theta_{1},\theta_{2},\theta_{3},\theta_{4}\}$ , as defined in Table 2. Here $\vec{h}=\{h_{j}\}$ and $\vec{d}=\{d_{j}\},j=1\ldots n$ are the daily counts of hospitalizations and deaths produced by the model. Let $\vec{h}^{\circ}$ and $\vec{d}^{\circ}$ be their observed counterparts linked by a zero-mean Gaussian error. I.e.,

h_{j}^{\circ}=h_{j}(\vec{\theta})+\epsilon_{h},\epsilon_{h}\sim\mathcal{N}(0,% \sigma_{h}^{2})\mbox{\hskip 8.53581pt and \hskip 8.53581pt}d_{j}^{\circ}=d_{j}% (\vec{\theta})+\epsilon_{d},\epsilon_{d}\sim\mathcal{N}(0,\sigma_{d}^{2}).

The likelihood of observing $(\vec{h}^{\circ},\vec{d}^{\circ})$ , conditional on $\vec{\theta}$ , is

	$\displaystyle\mathcal{L}(\vec{h}^{\circ},\vec{d}^{\circ}\mid\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{1}{(2\pi)^{n/2}\sigma_{h}^{n}}\prod_{j=1}^{n}\exp{\left[-% \frac{1}{2}\left(\frac{h_{j}^{\circ}-h_{j}(\vec{\theta})}{\sigma_{h}}\right)^{% 2}\right]}\times\frac{1}{(2\pi)^{n/2}\sigma_{d}^{n}}\prod_{j=1}^{n}\exp{\left[% -\frac{1}{2}\left(\frac{d_{j}^{\circ}-d_{j}(\vec{\theta})}{\sigma_{d}}\right)^% {2}\right]}$
		$\displaystyle=$	$\displaystyle\frac{1}{(2\pi\sigma_{d}\sigma_{h})^{n}}\exp{\left[-\frac{S_{h}}{% 2\sigma_{h}^{2}}-\frac{S_{d}}{2\sigma_{d}^{2}}\right]},$

where $S_{h}=\sum_{j=1}^{n}\left(h_{j}^{\circ}-h_{j}(\vec{\theta})\right)^{2}$ , $S_{d}=\sum_{j=1}^{n}\left(d_{j}^{\circ}-d_{j}(\vec{\theta})\right)^{2}$ and $\left(h_{j}(\vec{\theta}),d_{j}(\vec{\theta})\right)$ are the model predictions corresponding to parameters $\vec{\theta}$ .

Let $\pi(\vec{\theta})$ be the prior belief of $\vec{\theta}$ as listed in Table 2 i.e., $\vec{\theta}\sim\mathcal{U}\left(\vec{\theta}^{l},\vec{\theta}^{u}\right)$ , where $\left(\vec{\theta}^{l},\vec{\theta}^{u}\right)$ are the lower and upper bounds in Table 2. The error variances $(\sigma_{h}^{2},\sigma_{d}^{2})$ are modeled with conjugate priors, i.e., with an inverse Gamma prior or,

\sigma_{h}^{-2}\sim\mathcal{G}\left(\frac{n_{s}+n}{2},\frac{n_{s}\zeta_{h}^{2}% +S_{h}}{2}\right)\mbox{\hskip 8.53581pt and \hskip 8.53581pt}\sigma_{d}^{-2}% \sim\mathcal{G}\left(\frac{n_{s}+n}{2},\frac{n_{s}\zeta_{d}^{2}+S_{d}}{2}% \right),

where $n_{s}\zeta_{h}^{2}+S_{h}$ and $n_{s}\zeta_{d}^{2}+S_{d}$ are the rate parameters of the Gamma ( $\mathcal{G}$ ) distributions and $n$ is the number of observations. This distribution leads to $\left(\zeta_{h}^{2},\zeta_{d}^{2}\right)$ being the prior means of $\left(\sigma_{h}^{2},\sigma_{d}^{2}\right)$ and $n_{s}$ is a user-defined value that, together with $\left(\zeta_{h}^{2},\zeta_{d}^{2}\right)$ , defines their prior variance. In this paper we use $n_{s}=1$ which implies a noninformative prior, $\zeta_{h}^{2}=\frac{S_{h}^{\text{OLS}}}{n-p},\zeta_{d}^{2}=\frac{S_{d}^{\text{% OLS}}}{n-p}$ where $S_{h}^{\text{OLS}},S_{d}^{\text{OLS}}$ are determined using the ordinary least squares (OLS) optimal parameter set from the dataset used for surrogate training, and $p=4$ as the number of parameters in $\vec{\theta}$ .

By Bayes rule, the likelihood $\mathcal{L}$ and the priors can be combined into an expression for the posterior distribution for $\vec{\theta}$ ,

P\left(\vec{\theta}\mid\vec{h}^{\circ},\vec{d}^{\circ}\right)\propto\frac{1}{(% 2\pi\sigma_{d}\sigma_{h})^{n}}\exp{\left[-\frac{S_{h}}{2\sigma_{h}^{2}}-\frac{% S_{d}}{2\sigma_{d}^{2}}\right]}\times\sigma_{h}^{n_{s}/2-1}\sigma_{d}^{n_{s}/2% -1}\exp{\left(-\frac{n_{s}\zeta_{h}^{2}}{2}-\frac{n_{s}\zeta_{d}^{2}}{2}\right% )}\times\pi\left(\vec{\theta}\right).

(2)

Delayed rejection adaptive Metropolis-Hastings sampling (DRAM) [25] was used to draw samples from the posterior in Equation 2. Strictly speaking, each step of the algorithm consists of a DRAM update of $\vec{\theta}$ followed by a Gibbs update of $\left(\sigma_{h}^{2},\sigma_{d}^{2}\right)$ , if the proposal for $\vec{\theta}$ is accepted by DRAM. The implementation of DRAM is available in the pymcmcstat Python package[39]. Since each iteration of DRAM requires an evaluation of CityCOVID, the use of the surrogate model described in § 3.1 makes the algorithm feasible. DRAM yields a Markov chain of samples $\vec{\theta}_{k},\ k=\{1,\ldots,K\}$ and in our study, $K=50,000$ steps. This sequence is checked for stationarity as a stop** criterion, using the method by Raftery and Warnes[57], which builds on an older method by Raftery and Lewis[47]. This method is implemented as an R version 4.3.3 (2024-02-29)[46] package mcgibbsit[56]. The package computes the minimum run length $N_{min}$ , the required burn-in $M$ , and the number of samples required to meet an estimation accuracy criterion for each component of $\vec{\theta}$ .

4 Results

4.1 Surrogate performance

The reconstruction of the data from 4 PCA components yields a median absolute relative error of 2%. Figure 2(a) shows a scree plot demonstrating the variance explained as the number of principal components is increased. As can be observed, the temporal dynamics present in the hospitalizations and death trajectories from the “mean-model” of CityCOVID are smooth and fairly simple, allowing for efficient encoding in the principal components. An example comparison of a CityCOVID trajectory with its respective PCA compression is shown in Figure 3. Note that the relative error is significantly higher at early times when the number of hospitalizations and deaths are low due to division by small numbers in the relative error calculation. These small numbers do not affect the likelihood estimation in Equation 2 and are not discussed further.

The random forest trained to reconstruct the original trajectories was able to achieve a median absolute relative error of less than 5% over five fold cross validation. The distribution of the median absolute relative errors for a random testing set of holdout trajectories is shown in Figure 2(b) and several random examples demonstrating the surrogate predicted hospitalization and death curves as compared with CityCOVID trajectories are shown in Appendix A.

Hyperparameter	Description	Value
$\gamma_{1}$	number of trees	500
$\gamma_{2}$	split quality criterion	absolute error
$\gamma_{3}$	minimum number of samples per leaf	3
$\gamma_{4}$	max number of features per split	5

Table 4: Descriptions and values for random forest hyperparameters after brute force search. Values were selected using 5 fold cross validation.

4.2 Parameter estimation

Using the surrogate described in Section 3.1, approximate hospitalization and death trajectories were sampled for use with the MCMC sampling of $P(\vec{\theta}\ |\ h,d)$ as described in Section 3.2. Samples from the posterior distribution after 50,000 sampling steps are shown in Figure 4 split into pairwise and marginal representations.

The posterior samples illustrate a more pronounced peak for two of the variables: $\theta_{1}$ (rate of exposure to infected) and $\theta_{2}$ (probability of stay at home). It is also notable that these two parameters show a strong positive correlation. Namely, if one of the probabilities is increased in CityCOVID, the other must also be increased in order to reasonably match the data. This result aligns with our understanding of CityCOVID as well as with epidemiological systems. It can also be observed that the approximated posterior distribution for $\theta_{4}$ (probability of protective behaviors) is almost uniform in shape. Though this may be true for CityCOVID as well, it aligns closely with the sensitivities of the random forest shown in Table 3. Specifically, the lack of importance of this parameter to the random forest allows for almost uniform sampling of its value without significant impactlto the model outputs.

The marginal posterior distributions of this surrogate-based calibration alongside its prior distribution and the IMABC posterior distribution previously computed for CityCOVID[44] are shown in Figure 5. We see that the posterior distributions are peaked and very different from the corresponding priors, implying a significant gain of information regarding parameter values, post calibration, vis-à-vis the prior distribution. For $\theta_{4}$ (probability of protective behavior) we see that the probability density functions (PDFs) from MCMC and IMABC calibration somewhat agree but for the rest, the PDFs computed by MCMC are sharper than those obtained from IMABC.

The posterior distributions are checked via “pushforwards” and posterior predictive distributions. In the former, $N_{p}$ parameters are sampled from the posterior distribution and evaluated (in this case, using the surrogate model) to yield $(\vec{h},\vec{d})$ . In the latter, $\epsilon_{h}$ and $\epsilon_{d}$ , sampled from their posterior distribution, are added to the pushforward results. For $N_{p}=500$ , the trajectories are shown in Figure 6. The pushforward plots demonstrate that the surrogate is well converged to a narrow band of possible outcomes, which generally follow the trends of daily observed data computed via finite difference. The predictive posterior, which incorporates the calibrated uncertainties $\sigma_{h}$ and $\sigma_{d}$ from Equation 2, demonstrates almost complete coverage of the observations. This indicates that the uncertainty in the parameter estimates alone explain very little of the variability of the observations, where those are instead captured with the noise estimates $\sigma_{h}$ and $\sigma_{d}$ .

Figure 7 plots the VRHs from the surrogate-based calibration (left) and the CityCOVID push-forwards (right, described in more detail in § 4.3). Ideally, this VRH would show a uniform distribution demonstrating that our uncertainty bounds give full and balanced coverage of the true values. The left subfigure demonstrates that generally the surrogate-based calibration is balanced between over and under prediction (left subfigure). The VRHs (on the right) from the CityCOVID push-forward runs is far more skewed, showing that while using an approximate surrogate may make the calibration feasible, it incurs an error.

4.3 Parameter assessment

To fully evaluate the quality of the surrogate-based calibration, 100 parameter values were sampled from the approximated posterior distribution and were subsequently run through CityCOVID (using 50 random seeds for each parameter set as was done for the surrogate training set). The resulting posterior pushforward distribution is shown in Figure 8(a) alongside the pushforward produced in the IMABC calibration [44] in Figure 8(b).

These distributions demonstrate the accuracy of the DRAM and surrogate method when compared to the native IMABC calibration. Though significantly more efficient in computation, the surrogate approach produced similar results. It can be seen that the surrogate-based calibration somewhat over predicted during early times and did not capture the full uncertainty. This can be more precisely observed by considering a proper scoring rule such as the continuous rank probability score (CRPS) which is shown in brown in Figure 8 and was calculated with the scoringutils R library [9]. In hospitalizations, the surrogate-based calibration is seen to be less accurate for early times, but is roughly equivalent for the remainder. Alternatively, the deaths from the surrogate-based calibration are slightly more accurate for late times.

Another consideration in this comparison is the disparate number of parameters used for each calibration. Specifically, the IMABC calibration made use of the 9 parameters in Table 1 while the surrogate-based calibration presented here used only the 4 marked with stars. To compare these disparate measures, the deviance information criterion (DIC [20]) was used to measure the distance between each posterior pushforward and the observed data while taking into account the number of parameters. This metric is a generalization of the Akaike Information Criterion (AIC) for Bayesian model comparisons and can be written as:

	DIC	$\displaystyle=-2\log p(y\mid\hat{\theta})+2p_{\text{DIC}},$
	$\displaystyle p_{\text{DIC}}$	$\displaystyle=2\left(\log p(y\mid\hat{\theta})+\mathbb{E}_{\text{post}}\log p(% y\mid\theta)\right).$		(3)

DIC balances the accuracy of the Bayes estimate $\hat{\theta}$ and the effective number of parameters $p_{\text{DIC}}$ . Given the empirical distribution of posterior samples used for our pushforward distribution, the DIC was estimated with sample means:

	$\displaystyle\hat{\theta}$	$\displaystyle\approx\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}\theta_{i},$
	$\displaystyle\mathbb{E}_{\text{post}}\log p(y\mid\theta)$	$\displaystyle\approx\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}\log p(y\mid\theta_{i}),$		(4)

where $\theta_{i}$ is the $i^{\text{th}}$ sample from the calibrated posterior distribution used to compute the pushforward distribution. Accordingly, for the MCMC calibrated posterior $N_{p}=100$ and for the IMABC calibrated posterior $N_{p}=1158$ .

Comparing the DIC calculated for the MCMC and IMABC calibrated pushforward ensembles shows that although the CRPS of the MCMC pushforward higher than the IMABC pushforward, the predictive accuracy of the two approaches was close after accounting for the number of effective parameters. The numerical comparisons of these quantities averaged over time can be seen in Table 5. The DIC, which penalizes overly parlmetrized models, indicates in favor of the surrogate-based calibration.

An additional layer of comparison can be achieved via comparison of the VRHs of the IMABC and MCMC calibrated pushforwards. It is agreed that a uniform distribution is ideal for an ensemble forecast [21]. We thus constructed density normalized empirical distributions for the VRHs of the IMABC and MCMC pushforward trajectories and compared them with the corresponding discrete uniform distribution using KL divergence, Chi-squared distance, and Wasserstein distance. These measurements each provide a unique comparison between the empirical VRH ( $V$ ) and the corresponding discrete uniform distribution ( $U$ ). Specifically, measurements can be written as:

$\displaystyle D_{\text{KL}}(V,U)$	$\displaystyle=\sum_{x\in\mathcal{X}}V(x)\log\left(\frac{V(x)}{U(x)}\right)$	(5)
$\displaystyle D_{\chi^{2}}(V,U)$	$\displaystyle=\sum_{x\in\mathcal{X}}\frac{\left(V(x)-U(x)\right)^{2}}{U(x)}$	(6)
$\displaystyle D_{\text{wass}}(V,U)$	$\displaystyle=\frac{\sum_{x_{1}\in\mathcal{X}_{1}}\sum_{x_{2}\in\mathcal{X}_{2% }}\left(V(x_{1})-U(x_{2})\right)\\|x_{1}-x_{2}\\|}{\sum_{x_{1}\in\mathcal{X}_{1}% }\sum_{x_{2}\in\mathcal{X}_{2}}(V(x_{1})-U(x_{2}))}$	(7)

where $\mathcal{X}$ is the space of bins in our histogram and $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ encode the optimal paths to move mass from $V$ to $U$ (computed via linear optimization). These can be simply interpreted as follows:

KL Divergence (Eq. 5): Measurement of the lost information from using $V$ in place of $U$ .
Chi-Sq Distance (Eq. 6): Measurement of the difference in frequencies in the histograms.
Wasserstein Distance (Eq. 7): Measurement of the histogram density to be moved to align the VRH with the discrete uniform.

All three measures show that the surrogate-based calibration provides a VRH that is closer to a uniform distributiln than the one arising from IMABC. The tabulated results illustrate the competing effects of approximations in ABM calibration. While IMABC resulted in overly-wide marginal PDFs (see Figure 5), the surrogate-based calibration is not without its flaws. While the comparison was not evident pictorially in Figure 8, the tabulated summary in Table 5 shows that the MCMC calibration is slightly superior, despite the use of surrogate models.

However, for both calibrations, the verification rank histogram of these pushforward results seen in Figure 7 illustrates a general overprediction for hospitalizations and underprediction for deaths. In fact, the number of deaths is always underpredicted showing an imbalanced calibration. This holds true for both the calibrations, which were conducted using independent formulations of the estimation problem as well as using different algorithms. This error indicates a model-form error in CityCOVID that causes hospitalized people to die at a rate lower than what is observed.

	Hospitalizations		Deaths
Metric (lower values are better)	IMABC	MCMC	IMABC	MCMC
CRPS	39.74	47.85	42.53	34.96
DIC	685.79	596.50	16.40	9.95
KL Divergence (VRH)	0.33	0.21	0.88	0.79
Chi-Sq Distance (VRH)	41.00	24.13	114.75	95.06
Wasserstein Distance (VRH)	0.07	0.05	0.11	0.10

Table 5: Comparison of pushforward results of samples from surrogate-based MCMC calibration with previous approximate Bayesian calibration [44]. Rows show the continuous rank probability scores (CRPS), the deviance information criterion (DIC), and the KL-divergence, Chi-Squared distance, and Wasserstein distance of the verification rank histogram (VRH) from a uniform distribution.

5 Conclusion

We have described an accelerated approach to calibrate agent-based models for epidemiology in which the quantities of interest are population level metrics such a hospitalizations and deaths. In order to overcome the inherent stochasticity of these models, we considered a mean model which was averaged over random seeds. The temporal dynamics of the quantities of interest were then decomposed via PCA and a random forest was trained to reconstruct the data using these principal components and the input parameters for the ABM. This combination yielded a surrogate which could be used in place of the model for accelerated sampling.

In order to effectively use this surrogate on the model of interest, the Gini impurity of the random forest trained on some preliminary model outputs was used to reduce the parameter space to only 4 dimensions. A hypercube was then sampled in this lower dimensional space to provide training information for the surrogate. An empirical prior was then constructed which combined samples of the hypercube yielding hospitalization and death trajectories near those observed.

Equipped with the surrogate model and an adequate empirical prior, Markov chain Monte Carlo sampling was used with a Gaussian error model to sample from the posterior distribution. The samples, along with their respective pushforward and posterior predictive trajectories, were analyzed in comparison with a previous IMABC calibration of the model[44]. Ultimately, this surrogate accelerated approach yielded similar results to the original approach at a fraction of the computational cost. True posterior pushforward samples in combination with verification rank histograms and proper scoring rules such as CRPS demonstrated that the loss in accuracy of the accelerated surrogate-based calibration was almost negligible.

However, the final calibration using either IMABC or the full Bayesian inference approach tend to over or underpredict the values of interest. Future work will aim to correct this inaccuracy by doing a more complete incorporation of the stochasticity of the model. Several approaches have been proposed for this more complete analysis including fitting surrogates to approximate both the mean/median output across random seeds as well as the variance/quantiles across the random seeds [19] and fitting surrogates using the random seeds themselves [18].

Additionally, the surrogate used for this calibration was constructed in a global fashion on few dimensions and its limits are not yet well understood. We plan future work to compare this construction with alternative calibration approaches which can scale to larger dimensional problems or for which local surrogate construction can be combined with native model outputs to reduce the impact of surrogate model form error.

Author contributions

Connor Robertson formulated the problem, wrote the software to solve it, generated the figures and wrote the paper. Cosmin Safta assisted with software development, interpretation of results, and contributed to writing the paper. Nicholson Collier produced the CityCOVID training data. Jonathan Ozik provided guidance on CityCOVID and IMABC calibration and contributed to writing the paper. Jaideep Ray posed the problem, assisted with the epidemiological interpretation, and suggested the calibration approach and metrics.

Acknowledgments

We thank Arindam Fadikar and Chick Macal at Argonne National Laboratories for various useful discussions on surrogates and their application to epidemiological modeling. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. This article has been authored by an employee of National Technology & Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE). The employee owns all right, title and interest in and to the article and is solely responsible for its contents. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan https://www.energy.gov/downloads/doe-public-access-plan. This material is based upon work supported by the National Science Foundation under Grant 2200234, the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357 and the Bio-preparedness Research Virtual Environment (BRaVE) initiative. This research was completed with resources provided by the Laboratory Computing Resource Center at Argonne National Laboratory.

Financial disclosure

None reported.

Conflict of interest

The authors declare no potential conflict of interests.

References

[1] Abdulrahman A Ahmed, M Amin Rahimian, and Mark S Roberts. Inferring epidemic dynamics using gaussian process emulation of agent-based simulations. In 2023 Winter Simulation Conference (WSC), pages 770–780. IEEE, 2023.
[2] Claudio Angione, Eric Silverman, and Elisabeth Yaneske. Using machine learning as a surrogate model for agent-based simulations. Plos one, 17(2):e0263150, 2022.
[3] Claudio Angione, Eric Silverman, and Elisabeth Yaneske. Using machine learning as a surrogate model for agent-based simulations. Plos one, 17(2):e0263150, 2022.
[4] Rushil Anirudh, Jayaraman J Thiagarajan, Peer-Timo Bremer, Timothy Germann, Sara Del Valle, and Frederick Streitz. Accurate calibration of agent-based epidemiological models with neural network surrogates. In Workshop on Healthcare AI and COVID-19, pages 54–62. PMLR, 2022.
[5] Georges M Arnaout, Mahmoud T Khasawneh, Jun Zhang, and Shannon R Bowling. An intellidrive application for reducing traffic congestions using agent-based approach. In 2010 IEEE Systems and Information Engineering Design Symposium, pages 221–224. IEEE, 2010.
[6] Gaurav Arya, Moritz Schauer, Frank Schäfer, and Christopher Rackauckas. Automatic differentiation of programs with discrete randomness. Advances in Neural Information Processing Systems, 35:10435–10447, 2022.
[7] Priscilla Avegliano and Jaime Simão Sichman. Equation-based versus agent-based models: Why not embrace both for an efficient parameter calibration? Journal of Artificial Societies and Social Simulation, 26(4), 2023.
[8] Robert L Axtell and J Doyne Farmer. Agent-based modeling in economics and finance: Past, present, and future. Journal of Economic Literature, pages 1–101, 2022.
[9] Nikos I Bosse, Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. Evaluating forecasts with scoringutils in r. arXiv preprint arXiv:2205.07090, 2022.
[10] Benoît Calvez and Guillaume Hutzler. Automatic tuning of agent-based models using genetic algorithms. In International workshop on multi-agent systems and agent-based simulation, pages 41–57. Springer, 2005.
[11] Ayush Chopra, Alexander Rodríguez, Jayakumar Subramanian, Arnau Quera-Bofarull, Balaji Krishnamurthy, B Aditya Prakash, and Ramesh Raskar. Differentiable agent-based epidemiology. arXiv preprint arXiv:2207.09714, 2022.
[12] Nicholson Collier and Michael North. Parallel agent-based simulation with Repast for High Performance Computing. SIMULATION, 89(10):1215–1235, October 2013.
[13] Herbert Dawid, Giorgio Fagiolo, et al. Agent-based models for economic policy design: Introduction to the special issue. Journal of Economic Behavior & Organization, 67(2):351–354, 2008.
[14] Wim De Mulder, Bernhard Rengs, Geert Molenberghs, Thomas Fent, and Geert Verbeke. Statistical emulation applied to a very large data set generated by an agent-based model. In Proceedings of the Seventh International Conference on Advances in System Simulation, pages 43–48. -, 2015.
[15] Lander De Visscher, Bernard De Baets, and Jan M Baetens. A critical review of common pitfalls and guidelines to effectively infer parameters of agent-based models using approximate bayesian computation. Environmental Modelling & Software, page 105905, 2023.
[16] Wen Dong. Variational inference with agent-based models. arXiv preprint arXiv:1605.04360, 2016.
[17] Joel Dyer, Nicholas Bishop, Yorgos Felekis, Fabio Massimo Zennaro, Anisoara Calinescu, Theodoros Damoulas, and Michael Wooldridge. Interventionally consistent surrogates for agent-based simulators. arXiv preprint arXiv:2312.11158, 2023.
[18] Arindam Fadikar, Nicholson Collier, Abby Stevens, Jonathan Ozik, Mickaël Binois, and Kok Ben Toh. Trajectory-Oriented Optimization of Stochastic Epidemiological Models. In 2023 Winter Simulation Conference (WSC), pages 1244–1255, San Antonio, TX, USA, December 2023. IEEE.
[19] Arindam Fadikar, Dave Higdon, Jiangzhuo Chen, Bryan Lewis, Srinivasan Venkatramanan, and Madhav Marathe. Calibrating a stochastic, agent-based model using quantile-based emulation. SIAM/ASA Journal on Uncertainty Quantification, 6(4):1685–1706, 2018.
[20] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Texts in Statistical Science. Chaman & Hall / CRC press, 2 edition, 2003.
[21] Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E Raftery. Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(2):243–268, 2007.
[22] Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
[23] Tilmann Gneiting, Adrian E. Raftery, Anton H. Westveld, and Tom Goldman. Calibrated probabilistic forecasting using ensemble model output statistics and minimum crps estimation. Monthly Weather Review, 133(5):1098 – 1118, 2005.
[24] Jakob Grazzini, Matteo G Richiardi, and Mike Tsionas. Bayesian estimation of agent-based models. Journal of Economic Dynamics and Control, 77:26–47, 2017.
[25] Heikki Haario, Marko Laine, Antonietta Mira, and Eero Saksman. Dram: efficient adaptive mcmc. Statistics and Computing, 16:339–354, 2006.
[26] Thomas M Hamill. Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129(3):550–560, 2001.
[27] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
[28] Christian Hofer, Georg Jäger, and Manfred Füllsack. Including traffic jam avoidance in an agent-based network model. Computational social networks, 5:1–12, 2018.
[29] Anna L. Hotton, Jonathan Ozik, Chaitanya Kaligotla, Nick Collier, Abby Stevens, Aditya S. Khanna, Margaret M. MacDonell, Cheng Wang, David J. LePoire, Young-Soo Chang, Ignacio J. Martinez-Moyano, Bogdan Mucenic, Harold A. Pollack, John A. Schneider, and Charles Macal. Impact of changes in protective behaviors and out-of-household activities by age on COVID-19 transmission and hospitalization in chicago, illinois. Annals of Epidemiology, page S1047279722001053, 2022.
[30] Barbara FF Huang and Paul C Boutros. The parameter sensitivity of random forests. BMC bioinformatics, 17:1–13, 2016.
[31] Junjie Jiang, Zi-Gang Huang, Thomas P Seager, Wei Lin, Celso Grebogi, Alan Hastings, and Ying-Cheng Lai. Predicting tip** points in mutualistic networks through dimension reduction. Proceedings of the National Academy of Sciences, 115(4):E639–E647, 2018.
[32] Jonathan M Keith and Daniel Spring. Agent-based bayesian approach to monitoring the progress of invasive species eradication programs. Proceedings of the National Academy of Sciences, 110(33):13428–13433, 2013.
[33] Minh Kieu, Hoang Nguyen, Jonathan A Ward, and Nick Malleson. Towards real-time predictions using emulators of agent-based models. Journal of Simulation, 18(1):29–46, 2024.
[34] Francesco Lamperti, Andrea Roventini, and Amir Sani. Agent-based model calibration using machine learning surrogates. Journal of Economic Dynamics and Control, 90:366–389, 2018.
[35] Jacopo Lenti, Fabrizio Silvestri, and Gianmarco De Francisci Morales. Variational inference of parameters in opinion dynamics models. arXiv preprint arXiv:2403.05358, 2024.
[36] Vedran Ljubović. Traffic simulation using agent-based models. In 2009 XXII International Symposium on Information, Communication and Automation Technologies, pages 1–6. IEEE, 2009.
[37] Charles M Macal, Nicholson T Collier, Jonathan Ozik, Eric R Tatara, and John T Murphy. Chisim: An agent-based simulation model of social interactions in a large urban area. In 2018 winter simulation conference (WSC), pages 810–820. IEEE, 2018.
[38] Steven M Manson, Shipeng Sun, and Dudley Bonsal. Agent-based modeling and complexity. Agent-based models of geographical systems, pages 125–139, 2012.
[39] Paul R. Miles. pymcmcstat: A python package for bayesian inference using delayed rejection adaptive metropolis. Journal of Open Source Software, 4(38):1417, 2019. https://github.com/prmiles/pymcmcstat/wiki#citing-pymcmcstat.
[40] Max D Morris. Factorial sampling plans for preliminary computational experiments. Technometrics, 33(2):161–174, 1991.
[41] Ignacio Moya, Manuel Chica, and Oscar Cordon. Evolutionary multiobjective optimization for automatic agent-based model calibration: A comparative study. Ieee Access, 9:55284–55299, 2021.
[42] United States. Bureau of Labor Statistics. American time use survey (atus): Arts activities, [united states], 2003-2021, Jul 2023.
[43] Jonathan Ozik, Nicholson T. Collier, Justin M. Wozniak, Charles M. Macal, and Gary An. Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model With the EMEWS Framework. IEEE Transactions on Computational Social Systems, 5(3):884–895, September 2018.
[44] Jonathan Ozik, Justin M Wozniak, Nicholson Collier, Charles M Macal, and Mickaël Binois. A population data-driven workflow for covid-19 modeling and learning. The International Journal of High Performance Computing Applications, 35(5):483–499, 2021.
[45] Jasmina Panovska-Griffiths, Thomas Bayley, Tony Ward, Akashaditya Das, Luca Imeneo, Cliff Kerr, and Simon Maskell. Machine learning assisted calibration of stochastic agent-based models for pandemic outbreak analysis. https://www.researchsquare.com/article/rs-2773605/v1, 2023.
[46] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024.
[47] Adrian E. Raftery and Steven M. Lewis. Comment: One long run with diagnostics: Implementation strategies for markov chain. Statistical Science, 7(4):493–7, 1992.
[48] François Rebaudo, Verónica Crespo-Pérez, Jean-François Silvain, and Olivier Dangles. Agent-based modeling of human-induced spread of invasive species in agricultural landscapes: insights from the potato moth in ecuador. Journal of Artificial Societies and Social Simulation, 14(3):7, 2011.
[49] Juan Francisco Robles, Enrique Bermejo, Manuel Chica, and Óscar Cordón. Multimodal evolutionary algorithms for easing the complexity of agent-based model calibration. Journal of Artificial Societies and Social Simulation, 24(3), 2021.
[50] Carolyn M Rutter, Jonathan Ozik, Maria DeYoreo, and Nicholson Collier. Microsimulation model calibration using incremental mixture approximate bayesian computation. The Annals of Applied Statistics, 13(4):2189, 2019.
[51] Raihanah Adawiyah Shaharuddin and Md Yushalify Misro. Controlling traffic congestion in urbanised city: A framework using agent-based modelling and simulation approach. ISPRS International Journal of Geo-Information, 12(6):226, 2023.
[52] Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55(1-3):271–280, 2001.
[53] Elske van der Vaart, Mark A Beaumont, Alice SA Johnston, and Richard M Sibly. Calibration and evaluation of individual-based models using approximate bayesian computation. Ecological Modelling, 312:182–190, 2015.
[54] Ben Vermeulen and Andreas Pyka. Agent-based modeling for decision making in economics under uncertainty. Economics, 10(1):20160006, 2016.
[55] Minhong Wang, Athanasios Tsanas, Guillaume Blin, and Dave Robertson. Predicting pattern formation in embryonic stem cells using a minimalist, agent-based probabilistic model. Scientific Reports, 10(1):16209, 2020.
[56] Gregory R. Warnes and Robert Burrows. mcgibbsit: Warnes and Raftery’s ’MCGibbsit’ MCMC Run Length and Convergence Diagnostic, 2023. R package version 1.2.2.
[57] Gregory W. Warnes. Multi-Chain and Parallel Algorithms for Markov Chain Monte Carlo. PhD thesis, Department of Biostatistics, University of Washington, 2000. https://digital.lib.washington.edu/researchworks/handle/1773/9541.
[58] Tien-Tsin Wong, Wai-Shing Luk, and Pheng-Ann Heng. Sampling with hammersley and halton points. Journal of graphics tools, 2(2):9–24, 1997.

Code to reproduce results from this article can be found at https://github.com/sandialabs/Bayesian-calibration-of-stochastic-agent-based-model-via-random-forest.

Appendix A Surrogate performance

The random forest surrogate has no formal guarantees to match CityCOVID. As a result, its sensitivity and accuracy need to be independently verified. The sensitivity of the random forest to the most impactful ABM parameters is shown in Table 6 for various sensitivity measures:

Gini importance: A measure of the frequency with which a parameter is used for splits within the trees of the forest. More frequent splitting is indicative of the forest’s reliance on information from that parameter.
Permutation importance: A measure of accuracy of the forest when the input data of a parameter is shuffled. Severely reduced accuracy when a single parameter is shuffled indicates the forest’s reliance on information from that parameter.
Sobol (first): A measure of the variance of the output of the random forest across variations in a parameter. Significant changes in output from adjustments to a single parameter indicates the forest’s reliance on information from that parameter.
Sobol (total): A measure of the variance of the output of the random forest across variations in a parameter and that parameter in combination with others. Significant changes in output from adjustments to a single parameter indicates the forest’s reliance on information from that parameter. This total form also attempts to include nonlinear interactions with other parameters.

Feature	Gini Importance	Permutation Importance	Sobol (first)	Sobol (total)
Rate of exposure to infected	0.17	0.06	0.05	0.09
Time of initial exposure	0.17	0.24	0.26	0.37
Probability of stay at home	0.11	0.16	0.34	0.44
Probability of protective behaviors	0.11	0.04	0.04	0.07
Shielding by other susceptible	0.10	0.03	0.05	0.08
Number of initially infected	0.09	0.03	0.03	0.06
Seasonality multiplier	0.08	0.03	0.01	0.02
Proportion isolating in nursing home	0.07	0.01	0.01	0.01
Proportion isolating in home	0.07	0.01	0.00	0.01

Table 6: Random forest feature importance metrics using 9 identified important parameters and data from IMABC calibration [44]. Parameters selected by analysis of the surrogate sensitivity are bolded.

After reducing the input parameters of the random forest surrogate, accuracy of the surrogate can be determined via the median absolute relative error as shown in Figure 2. Some concrete examples demonstrating the absolute relative error for several output trajectories can be seen in Figure 9. These examples visually demonstrate the large relative errors for small values of hospitalizations and deaths which encouraged the use of the median as a measure of accuracy.