Bayesian calibration of stochastic agent based model via random forest

[Uncaptioned image] Connor Robertson
Sandia National Laboratories
Livermore, CA
[email protected]
&Cosmin Safta
Sandia National Laboratories
Livermore, CA
&Nicholson Collier
Argonne National Laboratory
Chicago, IL
&Jonathan Ozik
Argonne National Laboratory
Chicago, IL
&Jaideep Ray
Sandia National Laboratories
Livermore, CA
Abstract

Agent-based models (ABM) provide an excellent framework for modeling outbreaks and interventions in epidemiology by explicitly accounting for diverse individual interactions and environments. However, these models are usually stochastic and highly parametrized, requiring precise calibration for predictive performance. When considering realistic numbers of agents and properly accounting for stochasticity, this high dimensional calibration can be computationally prohibitive. This paper presents a random forest based surrogate modeling technique to accelerate the evaluation of ABMs and demonstrates its use to calibrate an epidemiological ABM named CityCOVID via Markov chain Monte Carlo (MCMC). The technique is first outlined in the context of CityCOVID’s quantities of interest, namely hospitalizations and deaths, by exploring dimensionality reduction via temporal decomposition with principal component analysis (PCA) and via sensitivity analysis. The calibration problem is then presented and samples are generated to best match COVID-19 hospitalization and death numbers in Chicago from March to June in 2020. These results are compared with previous approximate Bayesian calibration (IMABC) results and their predictive performance is analyzed showing improved performance with a reduction in computation.

Keywords agent-based modeling, epidemiology, machine learning surrogate, Bayesian calibration, MCMC

1 Introduction

Agent-based models (ABMs) are powerful tools for simulating complex systems that have found use across diverse domains, from traffic flow and ecology to economics and epidemiology. These "bottom-up" computational frameworks represent systems as a collection of autonomous agents that interact with each other and with their environment. This decentralized, microscopic perspective allows ABMs to capture small scale or emergent phenomena that traditional “top-down,” or population level, approaches often miss.

ABMs find applications in a plethora of fields. In traffic flow, they have been used to identify transportation bottlenecks and explore conditions that reduce the efficiency of infrastructure [28, 5, 36, 51]. In ecology, they can accurately model the spread of invasive species and predict ecosystem tip** points [31, 32, 48]. In economics, they simulate market dynamics and assess the impact of policy interventions [13, 8, 54]. Despite their versatility, ABMs have a fundamental challenge: parameter calibration. While ABMs have been shown to effectively replicate historical data and trends, they often include a wide range of possible individual and environmental characteristics making calibrating ABMs a high-dimensional problem. Further, ABM simulations often scale poorly due to agent interactions, making each run of the model computationally expensive. To compound these challenges, ABM models are often inherently stochastic. Unlike deterministic models, ABMs incorporate randomness in agent behaviors and in environmental factors. Though this stochasticity is crucial for capturing real-world complexity, it introduces significant uncertainty, making precise calibration elusive and increasing the computational expense of calibration.

Various calibration techniques for ABMs have been proposed to align simulations with empirical data. These include Approximate Bayesian Computation (ABC) [15, 53, 50], variational inference [16, 35], Markov chain Monte Carlo (MCMC) [24], and evolutionary algorithms [41, 49]. These parameter estimation approaches are commonly combined with an emulator or surrogate model of the ABM such as Gaussian processes [14, 1, 19, 44], decision trees (or forests) [33, 2, 34, 43], or ordinary differential equations (ODEs) [7] to reduce computational cost. However, each of these existing calibration approaches face limitations, including their ability to address stochasticity in the underlying ABM, guarantees on their convergence, and their computational efficiency.

This paper proposes a novel approach to calibrating ABMs by introducing a random forest based global surrogate model which can connect the nonlinear dependence of population level outputs to the ABM parameters over long temporal stretches. This approach includes decomposing quantities of interest via principal component analysis (PCA) and using the built-in sensitivity measures of the random forest to reduce the dimensionality. This surrogate is combined with Bayesian sampling, in the form of MCMC, to produce approximate posterior distributions for the ABM parameters of interest in a fraction of the time it would take using repeated ABM evaluations. Rigorous validation metrics are used to quantify the success of the calibration and the resulting posterior distributions are sampled to produce an ABM generated “pushforward” comparison. Though generally applicable to any ABM calibrated to population-level observations, we will present the calibration approach in the context of the epidemiological ABM CityCOVID [44, 29], which was used to model the spread of COVID-19 in the greater Chicago area during 2020 and supported city and state public health decision making.

The remainder of the paper is structured as follows. Section 2 outlines the CityCOVID ABM, its use, and respective calibration and surrogate training data. Section 3 outlines the surrogate construction procedure and details the formulation of the calibration problem for this ABM. Sections 4 and 5 discuss the results of the calibration procedure for CityCOVID and conclude the paper with key findings, future directions, and the broader implications of our work for advancing ABM calibration.

1.1 Literature review

There are a variety of approaches which have been explored to calibrate epidemiological ABMs. Among these is the approach by Fadikar et al. [19] who used Gaussian process surrogates to model the mean evolution of an epidemiological ABM and its quantile evolution to capture stochasticity. Further, Anirudh et al. [4] presented a surrogate modeling approach which first decomposed data into temporal modes using PCA and then modeled the map** from ABM parameters to PCA weights using a neural network. In each of these cases, the parameter estimation was performed with either rejection ABC or MCMC.

Calibration without the use of surrogate accelerators has been attempted by way of genetic algorithms, which can identify successful parameter values in search spaces of moderate dimension but they do not provide uncertainty quantification [10]. This approach allows for global exploration of the parameter space of ABMs but requires specifying fitness functions which are often problem specific and do not have any convergence guarantees. Given the stochasticity of most ABMs and the often limited data used for calibration, having reliable approaches that include estimated uncertainty on the parameter calibration is a necessity.

In the last decade, work has focused on leveraging more advanced machine learning approaches to improve calibration or to accelerate calibration with improved surrogate models. For example, LightGBM gradient boosted forests of decision trees have been used to filter proposed parameters in ABC sampling [45]. Alternatively, causal structure has been prescribed to fit surrogates made up of systems of ordinary differential equations or recurrent neural networks and capture ABM outputs in a latent space [17]. As a summary of surrogate approaches, Angione et al. [3] provided a comparison of various common machine learning algorithms applied as surrogate models for an example ABM. In this, neural networks were shown to be the most effective at replicating ABM outputs, but random forests were identified as a close competitor and far more computationally efficient. Additionally, the nonlinearity and stochasticity of the map** from ABM parameters to outputs was cited as a key challenge and PCA decomposition identified as a potential aid for surrogate accuracy over timeseries outputs. Across previous work, the stochasticity of ABMs was incorporated into surrogate modeling by including the random seed as an input parameter, reducing the individual seeded runs to deterministic models [45, 17, 3].

The most recent work for ABM calibration has begun exploring the possibility of differentiable ABMs, which can automatically provide information for calibration. New frameworks for ABM construction built on computational tools for deep learning can provide gradients for optimization automatically at evaluation of ABMs [11]. Fundamentally, this new approach can be extended even through discrete randomness, a characteristic feature of most ABMs [6]. These new approaches hold the potential of accelerating the calibration procedure using Hamiltonian-based MCMC approaches, which require gradients of the posterior distributions with respect to the parameters. This opens the possibility of calibrating high dimensional systems while maintaining the convergence guarantees of MCMC approaches. However, incorporating these automatic gradient computations requires fundamental reconstruction of the software for ABM modeling. The approach presented in this paper instead takes a black-box perspective which requires no intrusive modification of the model.

Properly assessing the accuracy of stochastic model calibration is often fraught with nuance and delicacy. However, one effective approach is the use of strictly proper scoring metrics [22] which allow for unique maximizers while still providing the flexibility to match the problem at hand. Most common among these metrics is the continuous ranked probability score (CRPS [23]) which generalizes mean absolute error for predictive cumulative distribution functions where a CRPS of 0 is a perfect match of an ensemble of stochastic simulations and the observation and positive values represent mean absolute error between each ensemble run and the observation. This scoring technique has been successfully used to evaluate the efficacy of ABMs previously [55]. Further validation can be found by considering the verification rank histogram [26] (VRH) of the ABM outputs which can identify over or under-dispersion or equivalently tendencies to over- or under-predict compared to observational data.

2 CityCOVID

CityCOVID is an ABM developed during the COVID-19 pandemic to quantify the outcomes of behavioral and policy interventions in the Chicago, IL metropolitan area. Based on the epidemiological ABM framework ChiSIM [37] and the Repast HPC distributed ABM toolkit [12], CityCOVID includes an age-stratified population of 2.7 million agents which occupy 1.2 million distinct locations including households, workplaces, schools, nursing homes, hospitals, and gyms. The agents move between these locations according to a variety of schedules based on their demographics [42] and can be exposed to infection when present at a location with infected agents. Individual agents transition between epidemiological states: susceptible, exposed, presymptomatic, infected (asymptomatic), infected (symptomatic), hospitalized, hospitalized (ICU), recovered, and deceased. The transitions are governed by probabilities of exposure to infected, hospitalization, or death and the durations of agent infections and hospitalizations are Gamma distributed.

In order to capture the heterogeneity and complexity of epidemiological dynamics between individuals, agents and their interactions are governed by draws from parametrized probability distributions. This high-dimensional parametrization gives flexible control over the impacts of different policies and interventions but also yields a wide range of possible outcomes and introduces enormous stochasticity, making the model a challenge to calibrate.

Given this challenge and its dimensionality, full Bayesian calibration is computationally infeasible. However, previous work by Ozik et al.[44] used a sequential approximate Bayesian calibration (IMABC) approach to iteratively sample from a prior distribution which evolves over time[50]. At each iteration, this distribution is updated by comparing the averaged hospitalization and death trajectories of stochastic realizations of CityCOVID parameter combinations with the empirical data observed in Chicago during 2020. This approach requires a large number of runs but is more efficient than comparable rejection ABC methods. Its efficiency was further improved by performing a Morris global sensitivity analysis [40], which allowed for a reduction to only 9 CityCOVID parameters that strongly influence the output hospitalization and death trajectories of the model. These parameters are listed in Table 1.

Even with parameter reduction and IMABC optimization, the model calibration required just over 32,000 runs which amounted to a total of 420,000 core hours on the Argonne Leadership Computing Facility Theta supercomputer. Relevant figures and details on CityCOVID and its calibration can be found in Ozik et al.[44] and specifics on the IMABC algorithm in Rutter et al.[50]. The mean outcomes of the calibrated model were a good match for the data and the predictive accuracy of these outcomes was sufficient to use as forecasting support for city and state public health response. This paper furthers this work by targeting a reduction in the computational burden of the calibration while also reducing the uncertainty in the posterior estimations.

Rates
* Exposure to infected
Probabilities
* Stay at home
Protective behaviors
Proportions
Isolating in home
Isolating in nursing home
Multipliers
Seasonality
Other
* Time of initial seeding of infections
Number of initially infected
Shielding by other susceptible
Table 1: The most influential parameters on the census outputs of hospitalizations and deaths in the CityCOVID ABM based on the Morris global sensitivity analysis. Parameters marked with stars represent those most influential to the surrogate model.

2.1 Data

In this section we describe the observational data used in model calibration as well as the process of generating data suitable for use in training a surrogate model to approximate CityCOVID.

Observational data: The daily census numbers of occupied hospitalization beds and cumulative deaths caused by COVID-19 were collected by the Illinois National Electronic Disease Surveillance System in Chicago from March to June of 2020. Due to the lack of COVID-19 testing capability early in 2020, case counts of COVID-19 were unreliable for use with calibration.

Surrogate training dataset: In order to accurately reproduce CityCOVID results with a surrogate model, a representative sample of the model outputs in a parameter range of interest was needed. The IMABC calibration effort (see §2) provided a 9 dimensional parameter space on which the model generates realistic rates of hospitalizations and deaths. By first training a surrogate model (described in § 3.1) on these previous simulations, feature importance could be recalculated to narrow the parameter space down to the top 4 parameters using the sensitivity metrics of the surrogate (random forest [30]). The sensitivity of the parameters for the surrogate are listed in Table 6 and the 4 retained parameters (θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG) are marked with stars in Table 1. The prior beliefs for these parameters were chosen to be non-informative within a region empirically chosen by comparison of the IMABC calibration outputs and the observational dala and are shown in Table 2.

Parameter Description Prior Belief
θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Rate of exposure to infected 𝒰(0.046,0.069)𝒰0.0460.069\mathcal{U}(0.046,0.069)caligraphic_U ( 0.046 , 0.069 )
θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Time of initial seeding of infections 𝒰(31,59)𝒰3159\mathcal{U}(31,59)caligraphic_U ( 31 , 59 )
θ3subscript𝜃3\theta_{3}italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT Probability of stay at home 𝒰(0.939,0.981)𝒰0.9390.981\mathcal{U}(0.939,0.981)caligraphic_U ( 0.939 , 0.981 )
θ4subscript𝜃4\theta_{4}italic_θ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT Probability of protective behaviors 𝒰(0.407,0.492)𝒰0.4070.492\mathcal{U}(0.407,0.492)caligraphic_U ( 0.407 , 0.492 )
Table 2: Prior beliefs for 4 retained parameters for surrogate training and calibration. The range of the uniform distributions was taken from preliminary runs of CityCOVID [44].

To give the surrogate adequate training information in these prior domains, 700 quasi-random samples of θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG were taken using Halton sampling[58] in the 4 dimensional space. Each of the 700 parameter sets were simulated with 50 different random seeds, allowing for a robust characterization of stochasticity to be represented in the outcome. These seeds alter the selection of who is initially infected in the simulation, random draws related to infectious state duration and transmission, and in location movements within agent schedules. The resulting hospitalization and death projections for each simulation were then averaged across random seeds to produce a “mean-model” estimate of the parameter outputs.

This approach does not fully account for the stochasticity of the ABM but reflects the most common approach used for ABM-based forecasting. Comparisons of the complete dataset with the observed hospitalizations (h^^\hat{h}over^ start_ARG italic_h end_ARG) and deaths (d^^𝑑\hat{d}over^ start_ARG italic_d end_ARG) are shown in Figure 1. Note that although the hospitalization and death information in CityCOVID is tied to specific agents and thus spatially distributed, here we only consider comparisons of the Chicago city-wide census quantities.

Refer to caption
Figure 1: Range of hospitalization and death trajectories for the observed data in Chicago from March-June of 2020 (black) and the CityCOVID simulations using parameter values from the 4 dimensional quasi-random hypercube used to train and test the surrogate method (blue). CityCOVID outputs are averaged across random seeds.

3 Methods

3.1 Surrogate model

Prior work with ABMs has demonstrated the highly nonlinear nature of their dynamics [38] and as such, reductions which characterize the temporal behavior into smooth modes have been effective [19, 4]. Following these approaches, this surrogate approach begins by decomposing the temporally concatenated hospitalization and death series using PCA:

[h1,1h1,nd1,1d1,nh2,1h2,nd2,1d2,nhm,1hm,ndm,1dm,n][α1,1α2,1αm,1]c1++[α1,2nα2,2nαm,2n]c2nmatrixsubscript11subscript1𝑛subscript𝑑11subscript𝑑1𝑛subscript21subscript2𝑛subscript𝑑21subscript𝑑2𝑛subscript𝑚1subscript𝑚𝑛subscript𝑑𝑚1subscript𝑑𝑚𝑛direct-productmatrixsubscript𝛼11subscript𝛼21subscript𝛼𝑚1subscript𝑐1direct-productmatrixsubscript𝛼12𝑛subscript𝛼22𝑛subscript𝛼𝑚2𝑛subscript𝑐2𝑛\displaystyle\begin{bmatrix}h_{1,1}&\ldots&h_{1,n}&d_{1,1}&\ldots&d_{1,n}\\ h_{2,1}&\ldots&h_{2,n}&d_{2,1}&\ldots&d_{2,n}\\ \vdots&\ldots&\vdots&\vdots&\ldots&\vdots\\ h_{m,1}&\ldots&h_{m,n}&d_{m,1}&\ldots&d_{m,n}\end{bmatrix}\rightarrow\begin{% bmatrix}\alpha_{1,1}\\ \alpha_{2,1}\\ \vdots\\ \alpha_{m,1}\end{bmatrix}\odot\vec{c}_{1}+\ldots+\begin{bmatrix}\alpha_{1,2n}% \\ \alpha_{2,2n}\\ \vdots\\ \alpha_{m,2n}\end{bmatrix}\odot\vec{c}_{2n}[ start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_h start_POSTSUBSCRIPT 1 , italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_d start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_d start_POSTSUBSCRIPT 1 , italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_h start_POSTSUBSCRIPT 2 , italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_d start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_d start_POSTSUBSCRIPT 2 , italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL … end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL … end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT italic_m , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_h start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_d start_POSTSUBSCRIPT italic_m , 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_d start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] → [ start_ARG start_ROW start_CELL italic_α start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT italic_m , 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⊙ over→ start_ARG italic_c end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + [ start_ARG start_ROW start_CELL italic_α start_POSTSUBSCRIPT 1 , 2 italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT 2 , 2 italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL italic_α start_POSTSUBSCRIPT italic_m , 2 italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⊙ over→ start_ARG italic_c end_ARG start_POSTSUBSCRIPT 2 italic_n end_POSTSUBSCRIPT (1)

where hi,j,di,jsubscript𝑖𝑗subscript𝑑𝑖𝑗h_{i,j},d_{i,j}italic_h start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT are hospitalizations and deaths for parameter set i𝑖iitalic_i at time step j𝑗jitalic_j, cj2nsubscript𝑐𝑗superscript2𝑛\vec{c}_{j}\in\mathbb{R}^{2n}over→ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT are the PCA components, and αi,jsubscript𝛼𝑖𝑗\alpha_{i,j}italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT are the coefficients multiplying components cjsubscript𝑐𝑗\vec{c}_{j}over→ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for parameter set i𝑖iitalic_i. To reduce the dimensionality of the surrogate map** from ABM parameters to component coefficients, the PCA decomposition is truncated to only the most dominant components cjsubscript𝑐𝑗\vec{c}_{j}over→ start_ARG italic_c end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in order to capture 95% of the temporal variance.

Given this decomposition, a regressor Rγssubscriptsuperscript𝑅𝑠𝛾R^{s}_{\vec{\gamma}}italic_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over→ start_ARG italic_γ end_ARG end_POSTSUBSCRIPT is trained to map the Morris sensitivity-reduced CityCOVID ABM parameter set (9 parameters) to the output coefficients αi={αi,j}subscript𝛼𝑖subscript𝛼𝑖𝑗\vec{\alpha}_{i}=\{\alpha_{i,j}\}over→ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT } for all time steps j=1,,n𝑗1𝑛j=1,\ldots,nitalic_j = 1 , … , italic_n with regressor hyperparameters γ𝛾\vec{\gamma}over→ start_ARG italic_γ end_ARG. Although a wide range of classical or machine learning surrogate approaches could be used for regressor Rγssubscriptsuperscript𝑅𝑠𝛾R^{s}_{\vec{\gamma}}italic_R start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over→ start_ARG italic_γ end_ARG end_POSTSUBSCRIPT [4], random forests were selected due to their computational efficiency and because their structure naturally capture both nonlinear and discontinuous behaviors, both of which are characteristic in ABMs [38].

To improve the computational efficiency of the surrogate-based calibration, the dimensionality of the ABM parameter set was reduced by analyzing the sensitivity of the random forest with Gini impurity[27], permutation importance[27], and Sobol indices[52]. These sensitivity metrics measure how important each input parameter is to: the structure of the trees (Gini), the scale of the outputs (permutation), and the variance of the output (Sobol). They are discussed in detail in Appendix A and the exact sensitivity values are listed in Table 3. As can be observed, the rate of exposure to infected, i.e., the hourly probability of getting exposed from an infected individual, and the time of initial seeding of infections play an outsized role on the hospitalization and death trajectories, according to the random forest.

Feature Gini Importance Permutation Importance Sobol (first) Sobol (total)
Rate of exposure to infected 0.41 0.59 0.52 0.56
Time of initial seeding of infections 0.32 0.61 0.29 0.33
Probability of stay at home 0.18 0.29 0.08 0.09
Probability of protective behaviors 0.08 0.05 0.01 0.01
Table 3: Random forest feature importance metrics for CityCOVID parameters

After reducing the input dimensionality, a final random forest Rγsubscript𝑅𝛾R_{\vec{\gamma}}italic_R start_POSTSUBSCRIPT over→ start_ARG italic_γ end_ARG end_POSTSUBSCRIPT was trained to map only the surrogate sensitivity-reduced CityCOVID ABM parameter set θi={θ1,θ2,θ3,θ4}isubscript𝜃𝑖subscriptsubscript𝜃1subscript𝜃2subscript𝜃3subscript𝜃4𝑖\vec{\theta}_{i}=\{\theta_{1},\theta_{2},\theta_{3},\theta_{4}\}_{i}over→ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the output coefficients αi={αi,j}subscript𝛼𝑖subscript𝛼𝑖𝑗\vec{\alpha}_{i}=\{\alpha_{i,j}\}over→ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_α start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT }. This random forest Rγsubscript𝑅𝛾R_{\vec{\gamma}}italic_R start_POSTSUBSCRIPT over→ start_ARG italic_γ end_ARG end_POSTSUBSCRIPT has several hyperparameters γ𝛾\vec{\gamma}over→ start_ARG italic_γ end_ARG which can dramatically affect its performance including the number of trees, the criterion used to split trees, and constraints on the number of samples needed for splits and leaves. In order to maximize accuracy, the hyperparameters were tuned via 5-fold cross validation brute-force search. Given the optimally selected hyperparameters, the random forest is trained with the full hypercube of surrogate sensitivity-reduced data.

3.2 Formulation of the estimation problem

Bayesian methods are a desirable approach for calibration due to their control over posterior form and convergence guarantees but are only practical for low dimensional calibrations. Our problem easily fit this requirement after reduction of the parameter space using surrogate sensitivity.

Though the surrogate model was created to estimate the census trajectories of hospitalizations and deaths, the non-stationarity of these features is prohibitive for convergence of Bayesian sampling via MCMC. This is because the census values of hospitalizations and deaths are of different scales and fluctuate over large ranges, the likelihood calculation can be biased toward approximating death curves for later dates. As a result, calibration is performed using rolling averages over 1-week windows for the daily counts of hospitalizations (hsuperscript\vec{h}^{\circ}over→ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) and deaths (dsuperscript𝑑\vec{d}^{\circ}over→ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) which are computed via finite differences. The surrogate outputs were also translated into daily counts by forward finite differences.

Let (h,d)=(θ)𝑑𝜃(\vec{h},\vec{d})=\mathcal{M}(\vec{\theta})( over→ start_ARG italic_h end_ARG , over→ start_ARG italic_d end_ARG ) = caligraphic_M ( over→ start_ARG italic_θ end_ARG ) be the daily predictions of the CityCOVID model, conditional on input parameters θ={θ1,θ2,θ3,θ4}𝜃subscript𝜃1subscript𝜃2subscript𝜃3subscript𝜃4\vec{\theta}=\{\theta_{1},\theta_{2},\theta_{3},\theta_{4}\}over→ start_ARG italic_θ end_ARG = { italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT }, as defined in Table 2. Here h={hj}subscript𝑗\vec{h}=\{h_{j}\}over→ start_ARG italic_h end_ARG = { italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } and d={dj},j=1nformulae-sequence𝑑subscript𝑑𝑗𝑗1𝑛\vec{d}=\{d_{j}\},j=1\ldots nover→ start_ARG italic_d end_ARG = { italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } , italic_j = 1 … italic_n are the daily counts of hospitalizations and deaths produced by the model. Let hsuperscript\vec{h}^{\circ}over→ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT and dsuperscript𝑑\vec{d}^{\circ}over→ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT be their observed counterparts linked by a zero-mean Gaussian error. I.e.,

hj=hj(θ)+ϵh,ϵh𝒩(0,σh2) and dj=dj(θ)+ϵd,ϵd𝒩(0,σd2).formulae-sequenceformulae-sequencesuperscriptsubscript𝑗subscript𝑗𝜃subscriptitalic-ϵsimilar-tosubscriptitalic-ϵ𝒩0superscriptsubscript𝜎2 and superscriptsubscript𝑑𝑗subscript𝑑𝑗𝜃subscriptitalic-ϵ𝑑similar-tosubscriptitalic-ϵ𝑑𝒩0superscriptsubscript𝜎𝑑2h_{j}^{\circ}=h_{j}(\vec{\theta})+\epsilon_{h},\epsilon_{h}\sim\mathcal{N}(0,% \sigma_{h}^{2})\mbox{\hskip 8.53581pt and \hskip 8.53581pt}d_{j}^{\circ}=d_{j}% (\vec{\theta})+\epsilon_{d},\epsilon_{d}\sim\mathcal{N}(0,\sigma_{d}^{2}).italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) + italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT = italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) + italic_ϵ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

The likelihood of observing (h,d)superscriptsuperscript𝑑(\vec{h}^{\circ},\vec{d}^{\circ})( over→ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , over→ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ), conditional on θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG, is

(h,dθ)superscriptconditionalsuperscript𝑑𝜃\displaystyle\mathcal{L}(\vec{h}^{\circ},\vec{d}^{\circ}\mid\vec{\theta})caligraphic_L ( over→ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , over→ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ∣ over→ start_ARG italic_θ end_ARG ) =\displaystyle== 1(2π)n/2σhnj=1nexp[12(hjhj(θ)σh)2]×1(2π)n/2σdnj=1nexp[12(djdj(θ)σd)2]1superscript2𝜋𝑛2superscriptsubscript𝜎𝑛superscriptsubscriptproduct𝑗1𝑛12superscriptsuperscriptsubscript𝑗subscript𝑗𝜃subscript𝜎21superscript2𝜋𝑛2superscriptsubscript𝜎𝑑𝑛superscriptsubscriptproduct𝑗1𝑛12superscriptsuperscriptsubscript𝑑𝑗subscript𝑑𝑗𝜃subscript𝜎𝑑2\displaystyle\frac{1}{(2\pi)^{n/2}\sigma_{h}^{n}}\prod_{j=1}^{n}\exp{\left[-% \frac{1}{2}\left(\frac{h_{j}^{\circ}-h_{j}(\vec{\theta})}{\sigma_{h}}\right)^{% 2}\right]}\times\frac{1}{(2\pi)^{n/2}\sigma_{d}^{n}}\prod_{j=1}^{n}\exp{\left[% -\frac{1}{2}\left(\frac{d_{j}^{\circ}-d_{j}(\vec{\theta})}{\sigma_{d}}\right)^% {2}\right]}divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] × divide start_ARG 1 end_ARG start_ARG ( 2 italic_π ) start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_exp [ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
=\displaystyle== 1(2πσdσh)nexp[Sh2σh2Sd2σd2],1superscript2𝜋subscript𝜎𝑑subscript𝜎𝑛subscript𝑆2superscriptsubscript𝜎2subscript𝑆𝑑2superscriptsubscript𝜎𝑑2\displaystyle\frac{1}{(2\pi\sigma_{d}\sigma_{h})^{n}}\exp{\left[-\frac{S_{h}}{% 2\sigma_{h}^{2}}-\frac{S_{d}}{2\sigma_{d}^{2}}\right]},divide start_ARG 1 end_ARG start_ARG ( 2 italic_π italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG roman_exp [ - divide start_ARG italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] ,

where Sh=j=1n(hjhj(θ))2subscript𝑆superscriptsubscript𝑗1𝑛superscriptsuperscriptsubscript𝑗subscript𝑗𝜃2S_{h}=\sum_{j=1}^{n}\left(h_{j}^{\circ}-h_{j}(\vec{\theta})\right)^{2}italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT - italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, Sd=j=1n(djdj(θ))2subscript𝑆𝑑superscriptsubscript𝑗1𝑛superscriptsuperscriptsubscript𝑑𝑗subscript𝑑𝑗𝜃2S_{d}=\sum_{j=1}^{n}\left(d_{j}^{\circ}-d_{j}(\vec{\theta})\right)^{2}italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and (hj(θ),dj(θ))subscript𝑗𝜃subscript𝑑𝑗𝜃\left(h_{j}(\vec{\theta}),d_{j}(\vec{\theta})\right)( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( over→ start_ARG italic_θ end_ARG ) ) are the model predictions corresponding to parameters θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG.

Let π(θ)𝜋𝜃\pi(\vec{\theta})italic_π ( over→ start_ARG italic_θ end_ARG ) be the prior belief of θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG as listed in Table 2 i.e., θ𝒰(θl,θu)similar-to𝜃𝒰superscript𝜃𝑙superscript𝜃𝑢\vec{\theta}\sim\mathcal{U}\left(\vec{\theta}^{l},\vec{\theta}^{u}\right)over→ start_ARG italic_θ end_ARG ∼ caligraphic_U ( over→ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , over→ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ), where (θl,θu)superscript𝜃𝑙superscript𝜃𝑢\left(\vec{\theta}^{l},\vec{\theta}^{u}\right)( over→ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT , over→ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) are the lower and upper bounds in Table 2. The error variances (σh2,σd2)superscriptsubscript𝜎2superscriptsubscript𝜎𝑑2(\sigma_{h}^{2},\sigma_{d}^{2})( italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) are modeled with conjugate priors, i.e., with an inverse Gamma prior or,

σh2𝒢(ns+n2,nsζh2+Sh2) and σd2𝒢(ns+n2,nsζd2+Sd2),similar-tosuperscriptsubscript𝜎2𝒢subscript𝑛𝑠𝑛2subscript𝑛𝑠superscriptsubscript𝜁2subscript𝑆2 and superscriptsubscript𝜎𝑑2similar-to𝒢subscript𝑛𝑠𝑛2subscript𝑛𝑠superscriptsubscript𝜁𝑑2subscript𝑆𝑑2\sigma_{h}^{-2}\sim\mathcal{G}\left(\frac{n_{s}+n}{2},\frac{n_{s}\zeta_{h}^{2}% +S_{h}}{2}\right)\mbox{\hskip 8.53581pt and \hskip 8.53581pt}\sigma_{d}^{-2}% \sim\mathcal{G}\left(\frac{n_{s}+n}{2},\frac{n_{s}\zeta_{d}^{2}+S_{d}}{2}% \right),italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∼ caligraphic_G ( divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_n end_ARG start_ARG 2 end_ARG , divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) and italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ∼ caligraphic_G ( divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_n end_ARG start_ARG 2 end_ARG , divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ,

where nsζh2+Shsubscript𝑛𝑠superscriptsubscript𝜁2subscript𝑆n_{s}\zeta_{h}^{2}+S_{h}italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and nsζd2+Sdsubscript𝑛𝑠superscriptsubscript𝜁𝑑2subscript𝑆𝑑n_{s}\zeta_{d}^{2}+S_{d}italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT are the rate parameters of the Gamma (𝒢𝒢\mathcal{G}caligraphic_G) distributions and n𝑛nitalic_n is the number of observations. This distribution leads to (ζh2,ζd2)superscriptsubscript𝜁2superscriptsubscript𝜁𝑑2\left(\zeta_{h}^{2},\zeta_{d}^{2}\right)( italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) being the prior means of (σh2,σd2)superscriptsubscript𝜎2superscriptsubscript𝜎𝑑2\left(\sigma_{h}^{2},\sigma_{d}^{2}\right)( italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and nssubscript𝑛𝑠n_{s}italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is a user-defined value that, together with (ζh2,ζd2)superscriptsubscript𝜁2superscriptsubscript𝜁𝑑2\left(\zeta_{h}^{2},\zeta_{d}^{2}\right)( italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), defines their prior variance. In this paper we use ns=1subscript𝑛𝑠1n_{s}=1italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1 which implies a noninformative prior, ζh2=ShOLSnp,ζd2=SdOLSnpformulae-sequencesuperscriptsubscript𝜁2superscriptsubscript𝑆OLS𝑛𝑝superscriptsubscript𝜁𝑑2superscriptsubscript𝑆𝑑OLS𝑛𝑝\zeta_{h}^{2}=\frac{S_{h}^{\text{OLS}}}{n-p},\zeta_{d}^{2}=\frac{S_{d}^{\text{% OLS}}}{n-p}italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT end_ARG start_ARG italic_n - italic_p end_ARG , italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT end_ARG start_ARG italic_n - italic_p end_ARG where ShOLS,SdOLSsuperscriptsubscript𝑆OLSsuperscriptsubscript𝑆𝑑OLSS_{h}^{\text{OLS}},S_{d}^{\text{OLS}}italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT , italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT OLS end_POSTSUPERSCRIPT are determined using the ordinary least squares (OLS) optimal parameter set from the dataset used for surrogate training, and p=4𝑝4p=4italic_p = 4 as the number of parameters in θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG.

By Bayes rule, the likelihood \mathcal{L}caligraphic_L and the priors can be combined into an expression for the posterior distribution for θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG,

P(θh,d)1(2πσdσh)nexp[Sh2σh2Sd2σd2]×σhns/21σdns/21exp(nsζh22nsζd22)×π(θ).proportional-to𝑃conditional𝜃superscriptsuperscript𝑑1superscript2𝜋subscript𝜎𝑑subscript𝜎𝑛subscript𝑆2superscriptsubscript𝜎2subscript𝑆𝑑2superscriptsubscript𝜎𝑑2superscriptsubscript𝜎subscript𝑛𝑠21superscriptsubscript𝜎𝑑subscript𝑛𝑠21subscript𝑛𝑠superscriptsubscript𝜁22subscript𝑛𝑠superscriptsubscript𝜁𝑑22𝜋𝜃P\left(\vec{\theta}\mid\vec{h}^{\circ},\vec{d}^{\circ}\right)\propto\frac{1}{(% 2\pi\sigma_{d}\sigma_{h})^{n}}\exp{\left[-\frac{S_{h}}{2\sigma_{h}^{2}}-\frac{% S_{d}}{2\sigma_{d}^{2}}\right]}\times\sigma_{h}^{n_{s}/2-1}\sigma_{d}^{n_{s}/2% -1}\exp{\left(-\frac{n_{s}\zeta_{h}^{2}}{2}-\frac{n_{s}\zeta_{d}^{2}}{2}\right% )}\times\pi\left(\vec{\theta}\right).italic_P ( over→ start_ARG italic_θ end_ARG ∣ over→ start_ARG italic_h end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , over→ start_ARG italic_d end_ARG start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ) ∝ divide start_ARG 1 end_ARG start_ARG ( 2 italic_π italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG roman_exp [ - divide start_ARG italic_S start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] × italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 - 1 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ) × italic_π ( over→ start_ARG italic_θ end_ARG ) . (2)

Delayed rejection adaptive Metropolis-Hastings sampling (DRAM) [25] was used to draw samples from the posterior in Equation 2. Strictly speaking, each step of the algorithm consists of a DRAM update of θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG followed by a Gibbs update of (σh2,σd2)superscriptsubscript𝜎2superscriptsubscript𝜎𝑑2\left(\sigma_{h}^{2},\sigma_{d}^{2}\right)( italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), if the proposal for θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG is accepted by DRAM. The implementation of DRAM is available in the pymcmcstat Python package[39]. Since each iteration of DRAM requires an evaluation of CityCOVID, the use of the surrogate model described in § 3.1 makes the algorithm feasible. DRAM yields a Markov chain of samples θk,k={1,,K}subscript𝜃𝑘𝑘1𝐾\vec{\theta}_{k},\ k=\{1,\ldots,K\}over→ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k = { 1 , … , italic_K } and in our study, K=50,000𝐾50000K=50,000italic_K = 50 , 000 steps. This sequence is checked for stationarity as a stop** criterion, using the method by Raftery and Warnes[57], which builds on an older method by Raftery and Lewis[47]. This method is implemented as an R version 4.3.3 (2024-02-29)[46] package mcgibbsit[56]. The package computes the minimum run length Nminsubscript𝑁𝑚𝑖𝑛N_{min}italic_N start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT, the required burn-in M𝑀Mitalic_M, and the number of samples required to meet an estimation accuracy criterion for each component of θ𝜃\vec{\theta}over→ start_ARG italic_θ end_ARG.

4 Results

4.1 Surrogate performance

The reconstruction of the data from 4 PCA components yields a median absolute relative error of 2%. Figure 2(a) shows a scree plot demonstrating the variance explained as the number of principal components is increased. As can be observed, the temporal dynamics present in the hospitalizations and death trajectories from the “mean-model” of CityCOVID are smooth and fairly simple, allowing for efficient encoding in the principal components. An example comparison of a CityCOVID trajectory with its respective PCA compression is shown in Figure 3. Note that the relative error is significantly higher at early times when the number of hospitalizations and deaths are low due to division by small numbers in the relative error calculation. These small numbers do not affect the likelihood estimation in Equation 2 and are not discussed further.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: (a) Scree plot demonstrating approximation power of different numbers of PCA modes with a dotted line at the number of modes used for the surrogate. (b) Median absolute relative error for surrogate reconstructions of CityCOVID hospitalization and death trajectories.
Refer to caption
Figure 3: Accuracy of the data reconstructed using 4 principal components. The components capture over 95% of the variance of the data.

The random forest trained to reconstruct the original trajectories was able to achieve a median absolute relative error of less than 5% over five fold cross validation. The distribution of the median absolute relative errors for a random testing set of holdout trajectories is shown in Figure 2(b) and several random examples demonstrating the surrogate predicted hospitalization and death curves as compared with CityCOVID trajectories are shown in Appendix A.

Hyperparameter Description Value
γ1subscript𝛾1\gamma_{1}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT number of trees 500
γ2subscript𝛾2\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT split quality criterion absolute error
γ3subscript𝛾3\gamma_{3}italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT minimum number of samples per leaf 3
γ4subscript𝛾4\gamma_{4}italic_γ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT max number of features per split 5
Table 4: Descriptions and values for random forest hyperparameters after brute force search. Values were selected using 5 fold cross validation.

4.2 Parameter estimation

Using the surrogate described in Section 3.1, approximate hospitalization and death trajectories were sampled for use with the MCMC sampling of P(θ|h,d)𝑃conditional𝜃𝑑P(\vec{\theta}\ |\ h,d)italic_P ( over→ start_ARG italic_θ end_ARG | italic_h , italic_d ) as described in Section 3.2. Samples from the posterior distribution after 50,000 sampling steps are shown in Figure 4 split into pairwise and marginal representations.

Refer to caption
Figure 4: Marginal and pairwise posterior samples from DRAM using the random forest surrogate.

The posterior samples illustrate a more pronounced peak for two of the variables: θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (rate of exposure to infected) and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (probability of stay at home). It is also notable that these two parameters show a strong positive correlation. Namely, if one of the probabilities is increased in CityCOVID, the other must also be increased in order to reasonably match the data. This result aligns with our understanding of CityCOVID as well as with epidemiological systems. It can also be observed that the approximated posterior distribution for θ4subscript𝜃4\theta_{4}italic_θ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT (probability of protective behaviors) is almost uniform in shape. Though this may be true for CityCOVID as well, it aligns closely with the sensitivities of the random forest shown in Table 3. Specifically, the lack of importance of this parameter to the random forest allows for almost uniform sampling of its value without significant impactlto the model outputs.

The marginal posterior distributions of this surrogate-based calibration alongside its prior distribution and the IMABC posterior distribution previously computed for CityCOVID[44] are shown in Figure 5. We see that the posterior distributions are peaked and very different from the corresponding priors, implying a significant gain of information regarding parameter values, post calibration, vis-à-vis the prior distribution. For θ4subscript𝜃4\theta_{4}italic_θ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT (probability of protective behavior) we see that the probability density functions (PDFs) from MCMC and IMABC calibration somewhat agree but for the rest, the PDFs computed by MCMC are sharper than those obtained from IMABC.

Refer to caption
Figure 5: Marginal posterior samples computed with sequential, rejection based IMABC in a previous calibration [44], from the prior, and from DRAM samples using the random forest surrogate.

The posterior distributions are checked via “pushforwards” and posterior predictive distributions. In the former, Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT parameters are sampled from the posterior distribution and evaluated (in this case, using the surrogate model) to yield (h,d)𝑑(\vec{h},\vec{d})( over→ start_ARG italic_h end_ARG , over→ start_ARG italic_d end_ARG ). In the latter, ϵhsubscriptitalic-ϵ\epsilon_{h}italic_ϵ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and ϵdsubscriptitalic-ϵ𝑑\epsilon_{d}italic_ϵ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, sampled from their posterior distribution, are added to the pushforward results. For Np=500subscript𝑁𝑝500N_{p}=500italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 500, the trajectories are shown in Figure 6. The pushforward plots demonstrate that the surrogate is well converged to a narrow band of possible outcomes, which generally follow the trends of daily observed data computed via finite difference. The predictive posterior, which incorporates the calibrated uncertainties σhsubscript𝜎\sigma_{h}italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT from Equation 2, demonstrates almost complete coverage of the observations. This indicates that the uncertainty in the parameter estimates alone explain very little of the variability of the observations, where those are instead captured with the noise estimates σhsubscript𝜎\sigma_{h}italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 6: Daily counts of hospitalizations and deaths as observed in Chicago and from the (a) posterior pushforward samples using the surrogate and (b) posterior predictive samples using the surrogate and the calibrated uncertainties (c) σhsubscript𝜎\sigma_{h}italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and σdsubscript𝜎𝑑\sigma_{d}italic_σ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT from Equation 2. These are obtained using the surrogate model.

Figure 7 plots the VRHs from the surrogate-based calibration (left) and the CityCOVID push-forwards (right, described in more detail in § 4.3). Ideally, this VRH would show a uniform distribution demonstrating that our uncertainty bounds give full and balanced coverage of the true values. The left subfigure demonstrates that generally the surrogate-based calibration is balanced between over and under prediction (left subfigure). The VRHs (on the right) from the CityCOVID push-forward runs is far more skewed, showing that while using an approximate surrogate may make the calibration feasible, it incurs an error.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: Verification rank histograms (VRHs) of (a) surrogate and (b) CityCOVID posterior pushforward samples compared with observed data in Chicago. The green color is caused by the overlap of blue and yellow histograms.

4.3 Parameter assessment

To fully evaluate the quality of the surrogate-based calibration, 100 parameter values were sampled from the approximated posterior distribution and were subsequently run through CityCOVID (using 50 random seeds for each parameter set as was done for the surrogate training set). The resulting posterior pushforward distribution is shown in Figure 8(a) alongside the pushforward produced in the IMABC calibration [44] in Figure 8(b).

Refer to caption
(a)
Refer to caption
(b)
Figure 8: Posterior pushforward runs for samples from posteriors calibrated with (a) DRAM and the random forest surrogate and (b) IMABC sampling as done previously [44]. These results are produced using CityCOVID natively, rather than the surrogate model.

These distributions demonstrate the accuracy of the DRAM and surrogate method when compared to the native IMABC calibration. Though significantly more efficient in computation, the surrogate approach produced similar results. It can be seen that the surrogate-based calibration somewhat over predicted during early times and did not capture the full uncertainty. This can be more precisely observed by considering a proper scoring rule such as the continuous rank probability score (CRPS) which is shown in brown in Figure 8 and was calculated with the scoringutils R library [9]. In hospitalizations, the surrogate-based calibration is seen to be less accurate for early times, but is roughly equivalent for the remainder. Alternatively, the deaths from the surrogate-based calibration are slightly more accurate for late times.

Another consideration in this comparison is the disparate number of parameters used for each calibration. Specifically, the IMABC calibration made use of the 9 parameters in Table 1 while the surrogate-based calibration presented here used only the 4 marked with stars. To compare these disparate measures, the deviance information criterion (DIC [20]) was used to measure the distance between each posterior pushforward and the observed data while taking into account the number of parameters. This metric is a generalization of the Akaike Information Criterion (AIC) for Bayesian model comparisons and can be written as:

DIC =2logp(yθ^)+2pDIC,absent2𝑝conditional𝑦^𝜃2subscript𝑝DIC\displaystyle=-2\log p(y\mid\hat{\theta})+2p_{\text{DIC}},= - 2 roman_log italic_p ( italic_y ∣ over^ start_ARG italic_θ end_ARG ) + 2 italic_p start_POSTSUBSCRIPT DIC end_POSTSUBSCRIPT ,
pDICsubscript𝑝DIC\displaystyle p_{\text{DIC}}italic_p start_POSTSUBSCRIPT DIC end_POSTSUBSCRIPT =2(logp(yθ^)+𝔼postlogp(yθ)).absent2𝑝conditional𝑦^𝜃subscript𝔼post𝑝conditional𝑦𝜃\displaystyle=2\left(\log p(y\mid\hat{\theta})+\mathbb{E}_{\text{post}}\log p(% y\mid\theta)\right).= 2 ( roman_log italic_p ( italic_y ∣ over^ start_ARG italic_θ end_ARG ) + blackboard_E start_POSTSUBSCRIPT post end_POSTSUBSCRIPT roman_log italic_p ( italic_y ∣ italic_θ ) ) . (3)

DIC balances the accuracy of the Bayes estimate θ^^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG and the effective number of parameters pDICsubscript𝑝DICp_{\text{DIC}}italic_p start_POSTSUBSCRIPT DIC end_POSTSUBSCRIPT. Given the empirical distribution of posterior samples used for our pushforward distribution, the DIC was estimated with sample means:

θ^^𝜃\displaystyle\hat{\theta}over^ start_ARG italic_θ end_ARG 1Npi=1Npθi,absent1subscript𝑁𝑝superscriptsubscript𝑖1subscript𝑁𝑝subscript𝜃𝑖\displaystyle\approx\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}\theta_{i},≈ divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,
𝔼postlogp(yθ)subscript𝔼post𝑝conditional𝑦𝜃\displaystyle\mathbb{E}_{\text{post}}\log p(y\mid\theta)blackboard_E start_POSTSUBSCRIPT post end_POSTSUBSCRIPT roman_log italic_p ( italic_y ∣ italic_θ ) 1Npi=1Nplogp(yθi),absent1subscript𝑁𝑝superscriptsubscript𝑖1subscript𝑁𝑝𝑝conditional𝑦subscript𝜃𝑖\displaystyle\approx\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}\log p(y\mid\theta_{i}),≈ divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_log italic_p ( italic_y ∣ italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (4)

where θisubscript𝜃𝑖\theta_{i}italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the ithsuperscript𝑖thi^{\text{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT sample from the calibrated posterior distribution used to compute the pushforward distribution. Accordingly, for the MCMC calibrated posterior Np=100subscript𝑁𝑝100N_{p}=100italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 100 and for the IMABC calibrated posterior Np=1158subscript𝑁𝑝1158N_{p}=1158italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 1158.

Comparing the DIC calculated for the MCMC and IMABC calibrated pushforward ensembles shows that although the CRPS of the MCMC pushforward higher than the IMABC pushforward, the predictive accuracy of the two approaches was close after accounting for the number of effective parameters. The numerical comparisons of these quantities averaged over time can be seen in Table 5. The DIC, which penalizes overly parlmetrized models, indicates in favor of the surrogate-based calibration.

An additional layer of comparison can be achieved via comparison of the VRHs of the IMABC and MCMC calibrated pushforwards. It is agreed that a uniform distribution is ideal for an ensemble forecast [21]. We thus constructed density normalized empirical distributions for the VRHs of the IMABC and MCMC pushforward trajectories and compared them with the corresponding discrete uniform distribution using KL divergence, Chi-squared distance, and Wasserstein distance. These measurements each provide a unique comparison between the empirical VRH (V𝑉Vitalic_V) and the corresponding discrete uniform distribution (U𝑈Uitalic_U). Specifically, measurements can be written as:

DKL(V,U)subscript𝐷KL𝑉𝑈\displaystyle D_{\text{KL}}(V,U)italic_D start_POSTSUBSCRIPT KL end_POSTSUBSCRIPT ( italic_V , italic_U ) =x𝒳V(x)log(V(x)U(x))absentsubscript𝑥𝒳𝑉𝑥𝑉𝑥𝑈𝑥\displaystyle=\sum_{x\in\mathcal{X}}V(x)\log\left(\frac{V(x)}{U(x)}\right)= ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_V ( italic_x ) roman_log ( divide start_ARG italic_V ( italic_x ) end_ARG start_ARG italic_U ( italic_x ) end_ARG ) (5)
Dχ2(V,U)subscript𝐷superscript𝜒2𝑉𝑈\displaystyle D_{\chi^{2}}(V,U)italic_D start_POSTSUBSCRIPT italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_V , italic_U ) =x𝒳(V(x)U(x))2U(x)absentsubscript𝑥𝒳superscript𝑉𝑥𝑈𝑥2𝑈𝑥\displaystyle=\sum_{x\in\mathcal{X}}\frac{\left(V(x)-U(x)\right)^{2}}{U(x)}= ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT divide start_ARG ( italic_V ( italic_x ) - italic_U ( italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_U ( italic_x ) end_ARG (6)
Dwass(V,U)subscript𝐷wass𝑉𝑈\displaystyle D_{\text{wass}}(V,U)italic_D start_POSTSUBSCRIPT wass end_POSTSUBSCRIPT ( italic_V , italic_U ) =x1𝒳1x2𝒳2(V(x1)U(x2))x1x2x1𝒳1x2𝒳2(V(x1)U(x2))absentsubscriptsubscript𝑥1subscript𝒳1subscriptsubscript𝑥2subscript𝒳2𝑉subscript𝑥1𝑈subscript𝑥2normsubscript𝑥1subscript𝑥2subscriptsubscript𝑥1subscript𝒳1subscriptsubscript𝑥2subscript𝒳2𝑉subscript𝑥1𝑈subscript𝑥2\displaystyle=\frac{\sum_{x_{1}\in\mathcal{X}_{1}}\sum_{x_{2}\in\mathcal{X}_{2% }}\left(V(x_{1})-U(x_{2})\right)\|x_{1}-x_{2}\|}{\sum_{x_{1}\in\mathcal{X}_{1}% }\sum_{x_{2}\in\mathcal{X}_{2}}(V(x_{1})-U(x_{2}))}= divide start_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_V ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_U ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) ∥ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_V ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_U ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_ARG (7)

where 𝒳𝒳\mathcal{X}caligraphic_X is the space of bins in our histogram and 𝒳1subscript𝒳1\mathcal{X}_{1}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒳2subscript𝒳2\mathcal{X}_{2}caligraphic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT encode the optimal paths to move mass from V𝑉Vitalic_V to U𝑈Uitalic_U (computed via linear optimization). These can be simply interpreted as follows:

KL Divergence (Eq. 5)

Measurement of the lost information from using V𝑉Vitalic_V in place of U𝑈Uitalic_U.

Chi-Sq Distance (Eq. 6)

Measurement of the difference in frequencies in the histograms.

Wasserstein Distance (Eq. 7)

Measurement of the histogram density to be moved to align the VRH with the discrete uniform.

All three measures show that the surrogate-based calibration provides a VRH that is closer to a uniform distributiln than the one arising from IMABC. The tabulated results illustrate the competing effects of approximations in ABM calibration. While IMABC resulted in overly-wide marginal PDFs (see Figure 5), the surrogate-based calibration is not without its flaws. While the comparison was not evident pictorially in Figure 8, the tabulated summary in Table 5 shows that the MCMC calibration is slightly superior, despite the use of surrogate models.

However, for both calibrations, the verification rank histogram of these pushforward results seen in Figure 7 illustrates a general overprediction for hospitalizations and underprediction for deaths. In fact, the number of deaths is always underpredicted showing an imbalanced calibration. This holds true for both the calibrations, which were conducted using independent formulations of the estimation problem as well as using different algorithms. This error indicates a model-form error in CityCOVID that causes hospitalized people to die at a rate lower than what is observed.

Hospitalizations Deaths
Metric (lower values are better) IMABC MCMC IMABC MCMC
CRPS 39.74 47.85 42.53 34.96
DIC 685.79 596.50 16.40 9.95
KL Divergence (VRH) 0.33 0.21 0.88 0.79
Chi-Sq Distance (VRH) 41.00 24.13 114.75 95.06
Wasserstein Distance (VRH) 0.07 0.05 0.11 0.10
Table 5: Comparison of pushforward results of samples from surrogate-based MCMC calibration with previous approximate Bayesian calibration [44]. Rows show the continuous rank probability scores (CRPS), the deviance information criterion (DIC), and the KL-divergence, Chi-Squared distance, and Wasserstein distance of the verification rank histogram (VRH) from a uniform distribution.

5 Conclusion

We have described an accelerated approach to calibrate agent-based models for epidemiology in which the quantities of interest are population level metrics such a hospitalizations and deaths. In order to overcome the inherent stochasticity of these models, we considered a mean model which was averaged over random seeds. The temporal dynamics of the quantities of interest were then decomposed via PCA and a random forest was trained to reconstruct the data using these principal components and the input parameters for the ABM. This combination yielded a surrogate which could be used in place of the model for accelerated sampling.

In order to effectively use this surrogate on the model of interest, the Gini impurity of the random forest trained on some preliminary model outputs was used to reduce the parameter space to only 4 dimensions. A hypercube was then sampled in this lower dimensional space to provide training information for the surrogate. An empirical prior was then constructed which combined samples of the hypercube yielding hospitalization and death trajectories near those observed.

Equipped with the surrogate model and an adequate empirical prior, Markov chain Monte Carlo sampling was used with a Gaussian error model to sample from the posterior distribution. The samples, along with their respective pushforward and posterior predictive trajectories, were analyzed in comparison with a previous IMABC calibration of the model[44]. Ultimately, this surrogate accelerated approach yielded similar results to the original approach at a fraction of the computational cost. True posterior pushforward samples in combination with verification rank histograms and proper scoring rules such as CRPS demonstrated that the loss in accuracy of the accelerated surrogate-based calibration was almost negligible.

However, the final calibration using either IMABC or the full Bayesian inference approach tend to over or underpredict the values of interest. Future work will aim to correct this inaccuracy by doing a more complete incorporation of the stochasticity of the model. Several approaches have been proposed for this more complete analysis including fitting surrogates to approximate both the mean/median output across random seeds as well as the variance/quantiles across the random seeds [19] and fitting surrogates using the random seeds themselves [18].

Additionally, the surrogate used for this calibration was constructed in a global fashion on few dimensions and its limits are not yet well understood. We plan future work to compare this construction with alternative calibration approaches which can scale to larger dimensional problems or for which local surrogate construction can be combined with native model outputs to reduce the impact of surrogate model form error.

Author contributions

Connor Robertson formulated the problem, wrote the software to solve it, generated the figures and wrote the paper. Cosmin Safta assisted with software development, interpretation of results, and contributed to writing the paper. Nicholson Collier produced the CityCOVID training data. Jonathan Ozik provided guidance on CityCOVID and IMABC calibration and contributed to writing the paper. Jaideep Ray posed the problem, assisted with the epidemiological interpretation, and suggested the calibration approach and metrics.

Acknowledgments

We thank Arindam Fadikar and Chick Macal at Argonne National Laboratories for various useful discussions on surrogates and their application to epidemiological modeling. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. This article has been authored by an employee of National Technology & Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE). The employee owns all right, title and interest in and to the article and is solely responsible for its contents. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan https://www.energy.gov/downloads/doe-public-access-plan. This material is based upon work supported by the National Science Foundation under Grant 2200234, the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357 and the Bio-preparedness Research Virtual Environment (BRaVE) initiative. This research was completed with resources provided by the Laboratory Computing Resource Center at Argonne National Laboratory.

Financial disclosure

None reported.

Conflict of interest

The authors declare no potential conflict of interests.

References

  • [1] Abdulrahman A Ahmed, M Amin Rahimian, and Mark S Roberts. Inferring epidemic dynamics using gaussian process emulation of agent-based simulations. In 2023 Winter Simulation Conference (WSC), pages 770–780. IEEE, 2023.
  • [2] Claudio Angione, Eric Silverman, and Elisabeth Yaneske. Using machine learning as a surrogate model for agent-based simulations. Plos one, 17(2):e0263150, 2022.
  • [3] Claudio Angione, Eric Silverman, and Elisabeth Yaneske. Using machine learning as a surrogate model for agent-based simulations. Plos one, 17(2):e0263150, 2022.
  • [4] Rushil Anirudh, Jayaraman J Thiagarajan, Peer-Timo Bremer, Timothy Germann, Sara Del Valle, and Frederick Streitz. Accurate calibration of agent-based epidemiological models with neural network surrogates. In Workshop on Healthcare AI and COVID-19, pages 54–62. PMLR, 2022.
  • [5] Georges M Arnaout, Mahmoud T Khasawneh, Jun Zhang, and Shannon R Bowling. An intellidrive application for reducing traffic congestions using agent-based approach. In 2010 IEEE Systems and Information Engineering Design Symposium, pages 221–224. IEEE, 2010.
  • [6] Gaurav Arya, Moritz Schauer, Frank Schäfer, and Christopher Rackauckas. Automatic differentiation of programs with discrete randomness. Advances in Neural Information Processing Systems, 35:10435–10447, 2022.
  • [7] Priscilla Avegliano and Jaime Simão Sichman. Equation-based versus agent-based models: Why not embrace both for an efficient parameter calibration? Journal of Artificial Societies and Social Simulation, 26(4), 2023.
  • [8] Robert L Axtell and J Doyne Farmer. Agent-based modeling in economics and finance: Past, present, and future. Journal of Economic Literature, pages 1–101, 2022.
  • [9] Nikos I Bosse, Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. Evaluating forecasts with scoringutils in r. arXiv preprint arXiv:2205.07090, 2022.
  • [10] Benoît Calvez and Guillaume Hutzler. Automatic tuning of agent-based models using genetic algorithms. In International workshop on multi-agent systems and agent-based simulation, pages 41–57. Springer, 2005.
  • [11] Ayush Chopra, Alexander Rodríguez, Jayakumar Subramanian, Arnau Quera-Bofarull, Balaji Krishnamurthy, B Aditya Prakash, and Ramesh Raskar. Differentiable agent-based epidemiology. arXiv preprint arXiv:2207.09714, 2022.
  • [12] Nicholson Collier and Michael North. Parallel agent-based simulation with Repast for High Performance Computing. SIMULATION, 89(10):1215–1235, October 2013.
  • [13] Herbert Dawid, Giorgio Fagiolo, et al. Agent-based models for economic policy design: Introduction to the special issue. Journal of Economic Behavior & Organization, 67(2):351–354, 2008.
  • [14] Wim De Mulder, Bernhard Rengs, Geert Molenberghs, Thomas Fent, and Geert Verbeke. Statistical emulation applied to a very large data set generated by an agent-based model. In Proceedings of the Seventh International Conference on Advances in System Simulation, pages 43–48. -, 2015.
  • [15] Lander De Visscher, Bernard De Baets, and Jan M Baetens. A critical review of common pitfalls and guidelines to effectively infer parameters of agent-based models using approximate bayesian computation. Environmental Modelling & Software, page 105905, 2023.
  • [16] Wen Dong. Variational inference with agent-based models. arXiv preprint arXiv:1605.04360, 2016.
  • [17] Joel Dyer, Nicholas Bishop, Yorgos Felekis, Fabio Massimo Zennaro, Anisoara Calinescu, Theodoros Damoulas, and Michael Wooldridge. Interventionally consistent surrogates for agent-based simulators. arXiv preprint arXiv:2312.11158, 2023.
  • [18] Arindam Fadikar, Nicholson Collier, Abby Stevens, Jonathan Ozik, Mickaël Binois, and Kok Ben Toh. Trajectory-Oriented Optimization of Stochastic Epidemiological Models. In 2023 Winter Simulation Conference (WSC), pages 1244–1255, San Antonio, TX, USA, December 2023. IEEE.
  • [19] Arindam Fadikar, Dave Higdon, Jiangzhuo Chen, Bryan Lewis, Srinivasan Venkatramanan, and Madhav Marathe. Calibrating a stochastic, agent-based model using quantile-based emulation. SIAM/ASA Journal on Uncertainty Quantification, 6(4):1685–1706, 2018.
  • [20] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Texts in Statistical Science. Chaman & Hall / CRC press, 2 edition, 2003.
  • [21] Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E Raftery. Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(2):243–268, 2007.
  • [22] Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
  • [23] Tilmann Gneiting, Adrian E. Raftery, Anton H. Westveld, and Tom Goldman. Calibrated probabilistic forecasting using ensemble model output statistics and minimum crps estimation. Monthly Weather Review, 133(5):1098 – 1118, 2005.
  • [24] Jakob Grazzini, Matteo G Richiardi, and Mike Tsionas. Bayesian estimation of agent-based models. Journal of Economic Dynamics and Control, 77:26–47, 2017.
  • [25] Heikki Haario, Marko Laine, Antonietta Mira, and Eero Saksman. Dram: efficient adaptive mcmc. Statistics and Computing, 16:339–354, 2006.
  • [26] Thomas M Hamill. Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129(3):550–560, 2001.
  • [27] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  • [28] Christian Hofer, Georg Jäger, and Manfred Füllsack. Including traffic jam avoidance in an agent-based network model. Computational social networks, 5:1–12, 2018.
  • [29] Anna L. Hotton, Jonathan Ozik, Chaitanya Kaligotla, Nick Collier, Abby Stevens, Aditya S. Khanna, Margaret M. MacDonell, Cheng Wang, David J. LePoire, Young-Soo Chang, Ignacio J. Martinez-Moyano, Bogdan Mucenic, Harold A. Pollack, John A. Schneider, and Charles Macal. Impact of changes in protective behaviors and out-of-household activities by age on COVID-19 transmission and hospitalization in chicago, illinois. Annals of Epidemiology, page S1047279722001053, 2022.
  • [30] Barbara FF Huang and Paul C Boutros. The parameter sensitivity of random forests. BMC bioinformatics, 17:1–13, 2016.
  • [31] Junjie Jiang, Zi-Gang Huang, Thomas P Seager, Wei Lin, Celso Grebogi, Alan Hastings, and Ying-Cheng Lai. Predicting tip** points in mutualistic networks through dimension reduction. Proceedings of the National Academy of Sciences, 115(4):E639–E647, 2018.
  • [32] Jonathan M Keith and Daniel Spring. Agent-based bayesian approach to monitoring the progress of invasive species eradication programs. Proceedings of the National Academy of Sciences, 110(33):13428–13433, 2013.
  • [33] Minh Kieu, Hoang Nguyen, Jonathan A Ward, and Nick Malleson. Towards real-time predictions using emulators of agent-based models. Journal of Simulation, 18(1):29–46, 2024.
  • [34] Francesco Lamperti, Andrea Roventini, and Amir Sani. Agent-based model calibration using machine learning surrogates. Journal of Economic Dynamics and Control, 90:366–389, 2018.
  • [35] Jacopo Lenti, Fabrizio Silvestri, and Gianmarco De Francisci Morales. Variational inference of parameters in opinion dynamics models. arXiv preprint arXiv:2403.05358, 2024.
  • [36] Vedran Ljubović. Traffic simulation using agent-based models. In 2009 XXII International Symposium on Information, Communication and Automation Technologies, pages 1–6. IEEE, 2009.
  • [37] Charles M Macal, Nicholson T Collier, Jonathan Ozik, Eric R Tatara, and John T Murphy. Chisim: An agent-based simulation model of social interactions in a large urban area. In 2018 winter simulation conference (WSC), pages 810–820. IEEE, 2018.
  • [38] Steven M Manson, Shipeng Sun, and Dudley Bonsal. Agent-based modeling and complexity. Agent-based models of geographical systems, pages 125–139, 2012.
  • [39] Paul R. Miles. pymcmcstat: A python package for bayesian inference using delayed rejection adaptive metropolis. Journal of Open Source Software, 4(38):1417, 2019. https://github.com/prmiles/pymcmcstat/wiki#citing-pymcmcstat.
  • [40] Max D Morris. Factorial sampling plans for preliminary computational experiments. Technometrics, 33(2):161–174, 1991.
  • [41] Ignacio Moya, Manuel Chica, and Oscar Cordon. Evolutionary multiobjective optimization for automatic agent-based model calibration: A comparative study. Ieee Access, 9:55284–55299, 2021.
  • [42] United States. Bureau of Labor Statistics. American time use survey (atus): Arts activities, [united states], 2003-2021, Jul 2023.
  • [43] Jonathan Ozik, Nicholson T. Collier, Justin M. Wozniak, Charles M. Macal, and Gary An. Extreme-Scale Dynamic Exploration of a Distributed Agent-Based Model With the EMEWS Framework. IEEE Transactions on Computational Social Systems, 5(3):884–895, September 2018.
  • [44] Jonathan Ozik, Justin M Wozniak, Nicholson Collier, Charles M Macal, and Mickaël Binois. A population data-driven workflow for covid-19 modeling and learning. The International Journal of High Performance Computing Applications, 35(5):483–499, 2021.
  • [45] Jasmina Panovska-Griffiths, Thomas Bayley, Tony Ward, Akashaditya Das, Luca Imeneo, Cliff Kerr, and Simon Maskell. Machine learning assisted calibration of stochastic agent-based models for pandemic outbreak analysis. https://www.researchsquare.com/article/rs-2773605/v1, 2023.
  • [46] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024.
  • [47] Adrian E. Raftery and Steven M. Lewis. Comment: One long run with diagnostics: Implementation strategies for markov chain. Statistical Science, 7(4):493–7, 1992.
  • [48] François Rebaudo, Verónica Crespo-Pérez, Jean-François Silvain, and Olivier Dangles. Agent-based modeling of human-induced spread of invasive species in agricultural landscapes: insights from the potato moth in ecuador. Journal of Artificial Societies and Social Simulation, 14(3):7, 2011.
  • [49] Juan Francisco Robles, Enrique Bermejo, Manuel Chica, and Óscar Cordón. Multimodal evolutionary algorithms for easing the complexity of agent-based model calibration. Journal of Artificial Societies and Social Simulation, 24(3), 2021.
  • [50] Carolyn M Rutter, Jonathan Ozik, Maria DeYoreo, and Nicholson Collier. Microsimulation model calibration using incremental mixture approximate bayesian computation. The Annals of Applied Statistics, 13(4):2189, 2019.
  • [51] Raihanah Adawiyah Shaharuddin and Md Yushalify Misro. Controlling traffic congestion in urbanised city: A framework using agent-based modelling and simulation approach. ISPRS International Journal of Geo-Information, 12(6):226, 2023.
  • [52] Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55(1-3):271–280, 2001.
  • [53] Elske van der Vaart, Mark A Beaumont, Alice SA Johnston, and Richard M Sibly. Calibration and evaluation of individual-based models using approximate bayesian computation. Ecological Modelling, 312:182–190, 2015.
  • [54] Ben Vermeulen and Andreas Pyka. Agent-based modeling for decision making in economics under uncertainty. Economics, 10(1):20160006, 2016.
  • [55] Minhong Wang, Athanasios Tsanas, Guillaume Blin, and Dave Robertson. Predicting pattern formation in embryonic stem cells using a minimalist, agent-based probabilistic model. Scientific Reports, 10(1):16209, 2020.
  • [56] Gregory R. Warnes and Robert Burrows. mcgibbsit: Warnes and Raftery’s ’MCGibbsit’ MCMC Run Length and Convergence Diagnostic, 2023. R package version 1.2.2.
  • [57] Gregory W. Warnes. Multi-Chain and Parallel Algorithms for Markov Chain Monte Carlo. PhD thesis, Department of Biostatistics, University of Washington, 2000. https://digital.lib.washington.edu/researchworks/handle/1773/9541.
  • [58] Tien-Tsin Wong, Wai-Shing Luk, and Pheng-Ann Heng. Sampling with hammersley and halton points. Journal of graphics tools, 2(2):9–24, 1997.

Appendix A Surrogate performance

The random forest surrogate has no formal guarantees to match CityCOVID. As a result, its sensitivity and accuracy need to be independently verified. The sensitivity of the random forest to the most impactful ABM parameters is shown in Table 6 for various sensitivity measures:

Gini importance

A measure of the frequency with which a parameter is used for splits within the trees of the forest. More frequent splitting is indicative of the forest’s reliance on information from that parameter.

Permutation importance

A measure of accuracy of the forest when the input data of a parameter is shuffled. Severely reduced accuracy when a single parameter is shuffled indicates the forest’s reliance on information from that parameter.

Sobol (first)

A measure of the variance of the output of the random forest across variations in a parameter. Significant changes in output from adjustments to a single parameter indicates the forest’s reliance on information from that parameter.

Sobol (total)

A measure of the variance of the output of the random forest across variations in a parameter and that parameter in combination with others. Significant changes in output from adjustments to a single parameter indicates the forest’s reliance on information from that parameter. This total form also attempts to include nonlinear interactions with other parameters.

Feature Gini Importance Permutation Importance Sobol (first) Sobol (total)
Rate of exposure to infected 0.17 0.06 0.05 0.09
Time of initial exposure 0.17 0.24 0.26 0.37
Probability of stay at home 0.11 0.16 0.34 0.44
Probability of protective behaviors 0.11 0.04 0.04 0.07
Shielding by other susceptible 0.10 0.03 0.05 0.08
Number of initially infected 0.09 0.03 0.03 0.06
Seasonality multiplier 0.08 0.03 0.01 0.02
Proportion isolating in nursing home 0.07 0.01 0.01 0.01
Proportion isolating in home 0.07 0.01 0.00 0.01
Table 6: Random forest feature importance metrics using 9 identified important parameters and data from IMABC calibration [44]. Parameters selected by analysis of the surrogate sensitivity are bolded.

After reducing the input parameters of the random forest surrogate, accuracy of the surrogate can be determined via the median absolute relative error as shown in Figure 2. Some concrete examples demonstrating the absolute relative error for several output trajectories can be seen in Figure 9. These examples visually demonstrate the large relative errors for small values of hospitalizations and deaths which encouraged the use of the median as a measure of accuracy.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 9: Reconstruction of the data using the surrogate for several different runs of CityCOVID. The relative error can be observed to be significantly larger for small values of hospitalizations and deaths. Here “RF Reconstruction” implies predictions with our surrogate model and “Rel. Err” implies Relative Error.