-
Evaluating distributional regression strategies for modelling self-reported sexual age-mixing
Authors:
Timothy M Wolock,
Seth R Flaxman,
Kathryn A Risher,
Tawanda Dadirai,
Simon Gregson,
Jeffrey W Eaton
Abstract:
The age dynamics of sexual partnership formation determine patterns of sexually transmitted disease transmission and have long been a focus of researchers studying human immunodeficiency virus. Data on self-reported sexual partner age distributions are available from a variety of sources. We sought to explore statistical models that accurately predict the distribution of sexual partner ages over a…
▽ More
The age dynamics of sexual partnership formation determine patterns of sexually transmitted disease transmission and have long been a focus of researchers studying human immunodeficiency virus. Data on self-reported sexual partner age distributions are available from a variety of sources. We sought to explore statistical models that accurately predict the distribution of sexual partner ages over age and sex. We identified which probability distributions and outcome specifications best captured variation in partner age and quantified the benefits of modelling these data using distributional regression. We found that distributional regression with a sinh-arcsinh distribution replicated observed partner age distributions most accurately across three geographically diverse data sets. This framework can be extended with well-known hierarchical modelling tools and can help improve estimates of sexual age-mixing dynamics.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data
Authors:
Andrew J. Holbrook,
Charles E. Loeffler,
Seth R. Flaxman,
Marc A. Suchard
Abstract:
The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes' computational burden: complexity of likeliho…
▽ More
The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes' computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis-Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage wide-spread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Inferring HIV incidence trends and transmission dynamics with a spatio-temporal HIV epidemic model
Authors:
Timothy M Wolock,
Seth R Flaxman,
Jeffrey W Eaton
Abstract:
Reliable estimation of spatio-temporal trends in population-level HIV incidence is becoming an increasingly critical component of HIV prevention policy-making. However, direct measurement is nearly impossible. Current, widely used models infer incidence from survey and surveillance seroprevalence data, but they require unrealistic assumptions about spatial independence across spatial units. In thi…
▽ More
Reliable estimation of spatio-temporal trends in population-level HIV incidence is becoming an increasingly critical component of HIV prevention policy-making. However, direct measurement is nearly impossible. Current, widely used models infer incidence from survey and surveillance seroprevalence data, but they require unrealistic assumptions about spatial independence across spatial units. In this study, we present an epidemic model of HIV that explicitly simulates the spatial dynamics of HIV over many small, interacting areal units. By integrating all available population-level data, we are able to infer not only spatio-temporally varying incidence, but also ART initiation rates and patient counts. Our study illustrates the feasibility of applying compartmental models to larger inferential problems than those to which they are typically applied, as well as the value of data fusion approaches to infectious disease modeling.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Variable Prioritization in Nonlinear Black Box Methods: A Genetic Association Case Study
Authors:
Lorin Crawford,
Seth R. Flaxman,
Daniel E. Runcie,
Mike West
Abstract:
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize cand…
▽ More
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other "black box" methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association map** studies, we show that applying RATE enables an explanation for this improved performance.
△ Less
Submitted 26 August, 2018; v1 submitted 22 January, 2018;
originally announced January 2018.
-
Improved prediction accuracy for disease risk map** using Gaussian Process stacked generalisation
Authors:
Samir Bhatt,
Ewan Cameron,
Seth R Flaxman,
Daniel J Weiss,
David L Smith,
Peter W Gething
Abstract:
Maps of infectious disease---charting spatial variations in the force of infection, degree of endemicity, and the burden on human health---provide an essential evidence base to support planning towards global health targets. Contemporary disease map** efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial…
▽ More
Maps of infectious disease---charting spatial variations in the force of infection, degree of endemicity, and the burden on human health---provide an essential evidence base to support planning towards global health targets. Contemporary disease map** efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial interpolation. The most common such approach is that of Gaussian process regression, a mathematical framework comprised of two components: a mean function harnessing the predictive power of multiple independent variables, and a covariance function yielding spatio-temporal shrinkage against residual variation from the mean. Though many techniques have been developed to improve the flexibility and fitting of the covariance function, models for the mean function have typically been restricted to simple linear terms. For infectious diseases, known to be driven by complex interactions between environmental and socio-economic factors, improved modelling of the mean function can greatly boost predictive power. Here we present an ensemble approach based on stacked generalisation that allows for multiple, non-linear algorithmic mean functions to be jointly embedded within the Gaussian process framework. We apply this method to map** Plasmodium falciparum prevalence data in Sub-Saharan Africa and show that the generalised ensemble approach markedly out-performs any individual method.
△ Less
Submitted 10 December, 2016;
originally announced December 2016.