-
Assembling ensembling: An adventure in approaches across disciplines
Authors:
Amanda Bleichrodt,
Lydia Bourouiba,
Gerardo Chowell,
Eric T. Lofgren,
J. Michael Reed,
Sadie J. Ryan,
Nina H. Fefferman
Abstract:
When we think of model ensembling or ensemble modeling, there are many possibilities that come to mind in different disciplines. For example, one might think of a set of descriptions of a phenomenon in the world, perhaps a time series or a snapshot of multivariate space, and perhaps that set is comprised of data-independent descriptions, or perhaps it is quite intentionally fit *to* data, or even…
▽ More
When we think of model ensembling or ensemble modeling, there are many possibilities that come to mind in different disciplines. For example, one might think of a set of descriptions of a phenomenon in the world, perhaps a time series or a snapshot of multivariate space, and perhaps that set is comprised of data-independent descriptions, or perhaps it is quite intentionally fit *to* data, or even a suite of data sets with a common theme or intention. The very meaning of 'ensemble' - a collection together - conjures different ideas across and even within disciplines approaching phenomena. In this paper, we present a typology of the scope of these potential perspectives. It is not our goal to present a review of terms and concepts, nor is it to convince all disciplines to adopt a common suite of terms, which we view as futile. Rather, our goal is to disambiguate terms, concepts, and processes associated with 'ensembles' and 'ensembling' in order to facilitate communication, awareness, and possible adoption of tools across disciplines.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
The Case for Controls: Identifying outbreak risk factors through case-control comparisons
Authors:
Nina H. Fefferman,
Michael J. Blum,
Lydia Bourouiba,
Nathaniel L. Gibson,
Qiang He,
Debra L. Miller,
Monica Papes,
Dana K. Pasquale,
Connor Verheyen,
Sadie J. Ryan
Abstract:
Investigations of infectious disease outbreaks often focus on identifying place- and context-dependent factors responsible for emergence and spread, resulting in phenomenological narratives ill-suited to develo** generalizable predictive and preventive measures. We contend that case-control hypothesis testing is a more powerful framework for epidemiological investigation. The approach, widely us…
▽ More
Investigations of infectious disease outbreaks often focus on identifying place- and context-dependent factors responsible for emergence and spread, resulting in phenomenological narratives ill-suited to develo** generalizable predictive and preventive measures. We contend that case-control hypothesis testing is a more powerful framework for epidemiological investigation. The approach, widely used in medical research, involves identifying counterfactuals, with case-control comparisons drawn to test hypotheses about the conditions that manifest outbreaks. Here we outline the merits of applying a case-control framework as epidemiological study design. We first describe a framework for iterative multidisciplinary interrogation to discover minimally sufficient sets of factors that can lead to disease outbreaks. We then lay out how case-control comparisons can respectively center on pathogen(s), factor(s), or landscape(s) with vignettes focusing on pathogen transmission. Finally, we consider how adopting case-control approaches can promote evidence-based decision making for responding to and preventing outbreaks.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Changing measurements or changing movements? Sampling scale and movement model identifiability across generations of biologging technology
Authors:
Leah R. Johnson,
Philipp H. Boersch-Supan,
Richard A. Phillips,
Sadie J. Ryan
Abstract:
1. Animal movement patterns contribute to our understanding of variation in breeding success and survival of individuals, and the implications for population dynamics. 2. Over time, sensor technology for measuring movement patterns has improved. Although older technologies may be rendered obsolete, the existing data are still valuable, especially if new and old data can be compared to test whether…
▽ More
1. Animal movement patterns contribute to our understanding of variation in breeding success and survival of individuals, and the implications for population dynamics. 2. Over time, sensor technology for measuring movement patterns has improved. Although older technologies may be rendered obsolete, the existing data are still valuable, especially if new and old data can be compared to test whether a behaviour has changed over time. 3. We used simulated data to assess the ability to quantify and correctly identify patterns of seabird flight lengths under observational regimes used in successive generations of tracking technology. 4. Care must be taken when comparing data collected at differing time-scales, even when using inference procedures that incorporate the observational process, as model selection and parameter estimation may be biased. In practice, comparisons may only be valid when degrading all data to match the lowest resolution in a set. 5. Changes in tracking technology that lead to aggregation of measurements at different temporal scales make comparisons challenging. We therefore urge ecologists to use synthetic data to assess whether accurate parameter estimation is possible for models comparing disparate data sets before conducting analyses such as responses to environmental changes or the assessment of management actions.
△ Less
Submitted 1 August, 2017; v1 submitted 17 March, 2017;
originally announced March 2017.
-
Phenomenological forecasting of disease incidence using heteroskedastic Gaussian processes: a dengue case study
Authors:
Leah R. Johnson,
Robert B. Gramacy,
Jeremy Cohen,
Erin Mordecai,
Courtney Murdock,
Jason Rohr,
Sadie J. Ryan,
Anna M. Stewart-Ibarra,
Daniel Weikel
Abstract:
In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week and total season incidence across each of several seasons. Our team was one of the top performers of that com…
▽ More
In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week and total season incidence across each of several seasons. Our team was one of the top performers of that competition, outperforming all other teams in multiple targets/locals. In this paper we report on our methodology, a large component of which, surprisingly, ignores the known biology of epidemics at large---in particular relationships between dengue transmission and environmental factors---and instead relies on flexible nonparametric nonlinear Gaussian process (GP) regression fits that "memorize" the trajectories of past seasons, and then "match" the dynamics of the unfolding season to past ones in real-time. Our phenomenological approach has advantages in situations where disease dynamics are less well understood, e.g., at sites with shorter histories of disease (such as Iquitos), or where measurements and forecasts of ancillary covariates like precipitation are unavailable and/or where the strength of association with cases are as yet unknown. In particular, we show that the GP approach generally outperforms a more classical generalized linear (autoregressive) model (GLM) that we developed to utilize abundant covariate information. We illustrate variations of our method(s) on the two benchmark locales alongside a full summary of results submitted by other contest competitors.
△ Less
Submitted 1 August, 2017; v1 submitted 1 February, 2017;
originally announced February 2017.
-
deBInfer: Bayesian inference for dynamical models of biological systems in R
Authors:
Philipp H Boersch-Supan,
Sadie J Ryan,
Leah R Johnson
Abstract:
1. Understanding the mechanisms underlying biological systems, and ultimately, predicting their behaviours in a changing environment requires overcoming the gap between mathematical models and experimental or observational data. Differential equations (DEs) are commonly used to model the temporal evolution of biological systems, but statistical methods for comparing DE models to data and for param…
▽ More
1. Understanding the mechanisms underlying biological systems, and ultimately, predicting their behaviours in a changing environment requires overcoming the gap between mathematical models and experimental or observational data. Differential equations (DEs) are commonly used to model the temporal evolution of biological systems, but statistical methods for comparing DE models to data and for parameter inference are relatively poorly developed. This is especially problematic in the context of biological systems where observations are often noisy and only a small number of time points may be available. 2. The Bayesian approach offers a coherent framework for parameter inference that can account for multiple sources of uncertainty, while making use of prior information. It offers a rigorous methodology for parameter inference, as well as modelling the link between unobservable model states and parameters, and observable quantities. 3. We present deBInfer, a package for the statistical computing environment R, implementing a Bayesian framework for parameter inference in DEs. deBInfer provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics, the visualisation of the posterior distributions of model parameters and trajectories, and the use of compiled DE models for improved computational performance. 4. The templating approach makes deBInfer applicable to a wide range of DE models. We demonstrate its application to ordinary and delay DE models for population ecology.
△ Less
Submitted 15 October, 2016; v1 submitted 29 April, 2016;
originally announced May 2016.
-
Hunting, food subsidies, and mesopredator release: the dynamics of crop-raiding baboons in a managed landscape
Authors:
Rachel A. Taylor,
Sadie J. Ryan,
Justin S. Brashares,
Leah R. Johnson
Abstract:
The establishment of protected areas or parks has become an important tool for wildlife conservation. However, frequent occurrences of human-wildlife conflict at the edges of these parks can undermine their conservation goals. Many African protected areas have experienced concurrent declines of apex predators alongside increases in both baboon abundance and the density of humans living near the pa…
▽ More
The establishment of protected areas or parks has become an important tool for wildlife conservation. However, frequent occurrences of human-wildlife conflict at the edges of these parks can undermine their conservation goals. Many African protected areas have experienced concurrent declines of apex predators alongside increases in both baboon abundance and the density of humans living near the park boundary. Baboons then take excursions outside of the park to raid crops for food, conflicting with the human population. We model the interactions of mesopredators (baboons), apex predators and shared prey in the park to analyze how four components affect the proportion of time that mesopredators choose to crop-raid: 1) the presence of apex predators; 2) nutritional quality of the crops; 3) mesopredator "shyness" about leaving the park; and 4) human hunting of mesopredators. We predict that the presence of apex predators in the park is the most effective method for controlling mesopredator abundance, and hence significantly reduces their impact on crops. Human hunting of mesopredators is less effective as it only occurs during crop-raiding excursions. Furthermore, making crops less attractive, for instance by planting crops further from the park boundary or farming less nutritional crops, can reduce the amount of time mesopredators crop-raid.
△ Less
Submitted 25 August, 2015;
originally announced August 2015.
-
A Global Map of Suitability for Coastal Vibrio cholerae Under Current and Future Climate Conditions
Authors:
Luis E. Escobar,
Sadie J. Ryan,
Anna M. Stewart-Ibarra,
Julia L. Finkelstein,
Christine A. King,
Huijie Qiao,
Mark E. Polhemus
Abstract:
Vibrio cholerae is a globally distributed water-borne pathogen that causes severe diarrheal disease and mortality, with current outbreaks as part of the seventh pandemic. Further understanding of the role of environmental factors in potential pathogen distribution and corresponding V. cholerae disease transmission over time and space is urgently needed to target surveillance of cholera and other c…
▽ More
Vibrio cholerae is a globally distributed water-borne pathogen that causes severe diarrheal disease and mortality, with current outbreaks as part of the seventh pandemic. Further understanding of the role of environmental factors in potential pathogen distribution and corresponding V. cholerae disease transmission over time and space is urgently needed to target surveillance of cholera and other climate and water-sensitive diseases. We used an ecological niche model (ENM) to identify environmental variables associated with V. cholerae presence in marine environments, to project a global model of V. cholerae distribution in ocean waters under current and future climate scenarios. We generated an ENM using published reports of V. cholerae in seawater and freely available remotely sensed imagery. Models indicated that factors associated with V. cholerae presence included chlorophyll-a, pH, and sea surface temperature (SST), with chlorophyll-a demonstrating the greatest explanatory power from variables selected for model calibration. We identified specific geographic areas for potential V. cholerae distribution. Coastal Bangladesh, where cholera is endemic, was found to be environmentally similar to coastal areas in Latin America. In a conservative climate change scenario, we observed a predicted increase in areas with environmental conditions suitable for V. cholerae. Findings highlight the potential for vulnerability maps to inform cholera surveillance, early warning systems, and disease prevention and control.
△ Less
Submitted 4 June, 2015;
originally announced June 2015.
-
Malaria control and senescence: the importance of accounting for the pace and shape of aging in wild mosquitoes
Authors:
Sadie J. Ryan,
Tal Ben-Horin,
Leah R. Johnson
Abstract:
The assumption that vector mortality remains constant with age is used widely to assess malaria transmission risk and predict the public health consequences of vector control strategies. However, laboratory studies commonly demonstrate clear evidence of senescence, or a decrease in physiological function and increase in vector mortality rate with age. We developed methods to integrate available fi…
▽ More
The assumption that vector mortality remains constant with age is used widely to assess malaria transmission risk and predict the public health consequences of vector control strategies. However, laboratory studies commonly demonstrate clear evidence of senescence, or a decrease in physiological function and increase in vector mortality rate with age. We developed methods to integrate available field data to understand mortality in wild Anopheles gambiae, the most import vector of malaria in sub-Saharan Africa. We found evidence for an increase in rates of mortality with age, a component of senescence. As expected, we also found that overall mortality is far greater in wild cohorts than commonly observed under protected laboratory conditions. The magnitude of senescence increases with An. gambiae lifespan, implying that wild mosquitoes die long before cohorts can exhibit strong senescence. We reviewed available published mortality studies of Anopheles spp. to confirm this fundamental prediction of aging in wild populations. Senescence becomes most apparent in long-living mosquito cohorts, and cohorts with low extrinsic mortality, such as those raised under protected laboratory conditions, suffer a relatively high proportion of senescent deaths. Imprecision in estimates of vector mortality and changes in mortality with age will severely bias models of vector borne disease transmission risk, such as malaria, and the sensitivity of transmission to bias increases as the extrinsic incubation period of the parasite decreases. While we focus here on malaria, we caution that future models for anti-vectorial interventions must therefore incorporate both realistic mortality rates and age-dependent changes in vector mortality.
△ Less
Submitted 29 September, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.
-
Population pressure and global markets drive a decade of forest cover change in Africa's Albertine Rift
Authors:
Sadie J. Ryan,
Michael Palace,
Joel Hartter,
Jeremy E. Diem,
Colin A. Chapman,
Jane Southworth
Abstract:
The Albertine Rift region faces rapid human population growth, while being a biodiversity hotspot. Using satellite-derived continuous forest cover change data, we examined national socioeconomic, demographic, and agricultural production data, and local demographic and geographic variables to assess multilevel forces driving significant local forest cover loss and gain outside protected areas durin…
▽ More
The Albertine Rift region faces rapid human population growth, while being a biodiversity hotspot. Using satellite-derived continuous forest cover change data, we examined national socioeconomic, demographic, and agricultural production data, and local demographic and geographic variables to assess multilevel forces driving significant local forest cover loss and gain outside protected areas during the first decade of this century. Because the processes that drive forest cover loss and gain are expected to be different, we constructed models of change in each direction. Although forest cover change varied by country, national level population change was the strongest driver of forest loss rate for all countries, with a population doubling predicted to cause 2.06 percent annual cover loss, while doubling tea production was predicted to cause 1.90 percent. The rate of forest cover gain was associated positively with increased production of the local staple crop cassava, but negatively with local population density and meat production, suggesting production drivers at multiple levels mitigate reforestation. We found a small, but significant, decrease of forest cover loss rate with increasing distance from protected areas, supporting studies suggesting higher rates of landscape change near protected areas. While local population density mitigated the rate of forest cover gain, cover loss also correlated to lower local population density, an apparent paradox, but consistent with findings that larger scale forces outweigh local drivers of deforestation. This implicates demographic and market forces at national and international scales as critical drivers of change, calling into question the necessary scale of forest protection policy in this biodiversity hotspot.
△ Less
Submitted 26 November, 2015; v1 submitted 25 September, 2014;
originally announced September 2014.
-
Spatiotemporal clustering, climate periodicity, and social-ecological risk factors for dengue during an outbreak in Machala, Ecuador, in 2010
Authors:
Anna M. Stewart Ibarra,
Angel G. Munoz,
Sadie J. Ryan,
Mercy J. Borbor,
Efrain Beltran Ayala,
Julia L. Finkelstein,
Raul Mejia,
Tania Ordonez,
G. Cristina Recalde Coronel,
Keytia Rivero
Abstract:
The objective of this study was to characterize the spatiotemporal dynamics and climatic and social-ecological risk factors associated with the largest dengue epidemic to date in Machala, Ecuador, to inform the development of a dengue EWS. The following data were included in analyses: neighborhood-level georeferenced dengue cases, national census data, and entomological surveillance data from 2010…
▽ More
The objective of this study was to characterize the spatiotemporal dynamics and climatic and social-ecological risk factors associated with the largest dengue epidemic to date in Machala, Ecuador, to inform the development of a dengue EWS. The following data were included in analyses: neighborhood-level georeferenced dengue cases, national census data, and entomological surveillance data from 2010; time series of weekly dengue cases (aggregated to the city-level) and meteorological data from 2003 to 2012. We applied LISA and Morans I to analyze the spatial distribution of the 2010 dengue cases, and developed multivariate logistic regression models through a multi-model selection process to identify census variables and entomological covariates associated with the presence of dengue at the neighborhood level. Using data aggregated at the city-level, we conducted a time-series (wavelet) analysis of weekly climate and dengue incidence (2003-2012) to identify significant time periods (e.g., annual, biannual) when climate co-varied with dengue, and to describe the climate conditions associated with the 2010 outbreak. We found significant hotspots of dengue transmission near the center of Machala. The best-fit model to predict the presence of dengue included older age and female gender of the head of the household, greater access to piped water in the home, poor housing condition, and less distance to the central hospital. Wavelet analyses revealed that dengue transmission co-varied with rainfall and minimum temperature at annual and biannual cycles, and we found that anomalously high rainfall and temperatures were associated with the 2010 outbreak. Our findings highlight the importance of geospatial information in dengue surveillance and the potential to develop a climate-driven spatiotemporal prediction models to inform disease prevention and control interventions.
△ Less
Submitted 13 January, 2015; v1 submitted 29 July, 2014;
originally announced July 2014.
-
Map** physiological suitability limits of malaria in Africa under climate change
Authors:
Sadie J. Ryan,
Amy McNally,
Leah R. Johnson,
Erin Mordecai,
Tal Ben-Horin,
Krijn Paaijmans,
Kevin D. Lafferty
Abstract:
We mapped current and future temperature suitability for malaria transmission in Africa using a published model that incorporates nonlinear physiological responses to temperature of the mosquito vector Anopheles gambiae and the malaria parasite Plasmodium falciparum. We found that a larger area of Africa currently experiences the ideal temperature for transmission than previously supposed. Under f…
▽ More
We mapped current and future temperature suitability for malaria transmission in Africa using a published model that incorporates nonlinear physiological responses to temperature of the mosquito vector Anopheles gambiae and the malaria parasite Plasmodium falciparum. We found that a larger area of Africa currently experiences the ideal temperature for transmission than previously supposed. Under future climate projections, we predicted a modest increase in the overall area suitable for malaria transmission, but a net decrease in the most suitable area. Combined with population density projections, our maps suggest that areas with temperatures suitable for year-round, highest risk transmission will shift from coastal West Africa to the Albertine Rift between Democratic Republic of Congo and Uganda, while areas with seasonal transmission suitability will shift toward sub-Saharan coastal areas. Map** temperature suitability places important bounds on malaria transmissibility and, along with local level demographic, socioeconomic, and ecological factors, can indicate where resources may be best spent on malaria control.
△ Less
Submitted 26 November, 2015; v1 submitted 28 July, 2014;
originally announced July 2014.
-
Understanding uncertainty in temperature effects on vector-borne disease: A Bayesian approach
Authors:
Leah R. Johnson,
Tal Ben-Horin,
Kevin D. Lafferty,
Amy McNally,
Erin Mordecai,
Krijn P. Paaijmans,
Samraat Pawar,
Sadie J. Ryan
Abstract:
Extrinsic environmental factors influence the distribution and population dynamics of many organisms, including insects that are of concern for human health and agriculture. This is particularly true for vector-borne infectious diseases, like malaria, which is a major source of morbidity and mortality in humans. Understanding the mechanistic links between environment and population processes for t…
▽ More
Extrinsic environmental factors influence the distribution and population dynamics of many organisms, including insects that are of concern for human health and agriculture. This is particularly true for vector-borne infectious diseases, like malaria, which is a major source of morbidity and mortality in humans. Understanding the mechanistic links between environment and population processes for these diseases is key to predicting the consequences of climate change on transmission and for develo** effective interventions. An important measure of the intensity of disease transmission is the reproductive number $R_0$. However, understanding the mechanisms linking $R_0$ and temperature, an environmental factor driving disease risk, can be challenging because the data available for parameterization are often poor. To address this we show how a Bayesian approach can help identify critical uncertainties in components of $R_0$ and how this uncertainty is propagated into the estimate of $R_0$. Most notably, we find that different parameters dominate the uncertainty at different temperature regimes: bite rate from 15-25$^\circ$ C; fecundity across all temperatures, but especially $\sim$25-32$^\circ$ C; mortality from 20-30$^\circ$ C; parasite development rate at $\sim$15-16$^\circ$C and again at $\sim$33-35$^\circ$C. Focusing empirical studies on these parameters and corresponding temperature ranges would be the most efficient way to improve estimates of $R_0$. While we focus on malaria, our methods apply to improving process-based models more generally, including epidemiological, physiological niche, and species distribution models.
△ Less
Submitted 22 May, 2014; v1 submitted 18 October, 2013;
originally announced October 2013.