Search | arXiv e-print repository

Modernising the Design and Analysis of Prevalence Surveys for Neglected Tropical Diseases

Authors: Peter J. Diggle, Claudio Fronterre, Katherine Gass, Lee Hundley, Reza Niles-Robin, Annastacia Sampson, Ana Morice, Ronaldo Carvalho Scholte

Abstract: Current WHO guidelines set prevalence thresholds below which a Neglected Tropical Disease can be considered to have been eliminated as a public health problem, and specify how surveys to assess whether elimination has been achieved should be designed and analysed, based on classical survey sampling methods. In this paper we describe an alternative approach based on geospatial statistical modelling… ▽ More Current WHO guidelines set prevalence thresholds below which a Neglected Tropical Disease can be considered to have been eliminated as a public health problem, and specify how surveys to assess whether elimination has been achieved should be designed and analysed, based on classical survey sampling methods. In this paper we describe an alternative approach based on geospatial statistical modelling. We first show the gains in efficiency that can be obtained by exploiting any spatial correlation in the underlying prevalence surface. We then suggest that the current guidelines implicit use of a significance testing argument is not appropriate; instead, we argue for a predictive inferential framework, leading to design criteria based on controlling the rates at which areas whose true prevalence lies above and below the elimination threshold are incorrectly classified. We describe how this approach naturally accommodates context-specific information in the form of georeferenced covariates that have been shown to be predictive of disease prevalence. Finally, we give a progress report of an ongoing collaboration with the Guyana Ministry of Health Neglected Tropical Disease program on the design of an IDA (Ivermectin, Diethylcarbamazine and Albendazole) Impact Survey (IIS) of lymphatic filariasis to be conducted in Guyana in early 2023 △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 11 pages, 3 figures

arXiv:2211.02364 [pdf, other]

Visualising spatio-temporal health data: the importance of capturing the 4th dimension

Authors: Alison C. Hale, Charlotte Appleton, P. -J. M. Noble, Gina L. Pinchbeck, Barry Rowlingson, Peter J. Diggle, Alan D. Radford, Christopher P. Jewell

Abstract: Confronted by a rapidly evolving health threat, such as an infectious disease outbreak, it is essential that decision-makers are able to comprehend the complex dynamics not just in space but also in the 4th dimension, time. In this paper this is addressed by a novel visualisation tool, referred to as the Dynamic Health Atlas web app, which is designed specifically for displaying the spatial evolut… ▽ More Confronted by a rapidly evolving health threat, such as an infectious disease outbreak, it is essential that decision-makers are able to comprehend the complex dynamics not just in space but also in the 4th dimension, time. In this paper this is addressed by a novel visualisation tool, referred to as the Dynamic Health Atlas web app, which is designed specifically for displaying the spatial evolution of data over time while simultaneously acknowledging its uncertainty. It is an interactive and open-source web app, coded predominantly in JavaScript, in which the geospatial and temporal data are displayed side-by-side. The first of two case studies of this visualisation tool relates to an outbreak of canine gastroenteric disease in the United Kingdom, where many veterinary practices experienced an unusually high case incidence. The second study concerns the predicted COVID-19 reproduction number along with incidence and prevalence forecasts in each local authority district in the United Kingdom. These studies demonstrate the effectiveness of the Dynamic Health Atlas web app at conveying geospatial and temporal dynamics along with their corresponding uncertainties. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: 4 Figures, 27 pages

arXiv:2109.13730 [pdf, other]

Interoperability of statistical models in pandemic preparedness: principles and reality

Authors: George Nicholson, Marta Blangiardo, Mark Briers, Peter J. Diggle, Tor Erlend Fjelde, Hong Ge, Robert J. B. Goudie, Radka Jersakova, Ruairidh E. King, Brieuc C. L. Lehmann, Ann-Marie Mallon, Tullia Padellini, Yee Whye Teh, Chris Holmes, Sylvia Richardson

Abstract: We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance usi… ▽ More We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance using probabilistic reasoning. We illustrate this through case studies for inferring spatial-temporal coronavirus disease 2019 (COVID-19) prevalence and reproduction numbers in England. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 26 pages, 10 figures, for associated mpeg file Movie 1 please see https://www.dropbox.com/s/kn9y1v6zvivfla1/Interoperability_of_models_Movie_1.mp4?dl=0

MSC Class: 62P10

arXiv:1804.02592 [pdf, other]

Linear Mixed-Effects Models for Non-Gaussian Repeated Measurement Data

Authors: Özgür Asar, David Bolin, Peter J. Diggle, Jonas Wallin

Abstract: We consider the analysis of continuous repeated measurement outcomes that are collected through time, also known as longitudinal data. A standard framework for analysing data of this kind is a linear Gaussian mixed-effects model within which the outcome variable can be decomposed into fixed-effects, time-invariant and time-varying random-effects, and measurement noise. We develop methodology that,… ▽ More We consider the analysis of continuous repeated measurement outcomes that are collected through time, also known as longitudinal data. A standard framework for analysing data of this kind is a linear Gaussian mixed-effects model within which the outcome variable can be decomposed into fixed-effects, time-invariant and time-varying random-effects, and measurement noise. We develop methodology that, for the first time, allows any combination of these stochastic components to be non-Gaussian, using multivariate Normal variance-mean mixtures. We estimate parameters by max- imum likelihood, implemented with a novel, computationally efficient stochastic gradient algorithm. We obtain standard error estimates by inverting the observed Fisher-information matrix, and obtain the predictive distributions for the random-effects in both filtering (conditioning on past and current data) and smoothing (conditioning on all data) contexts. To implement these procedures, we intro- duce an R package, ngme. We re-analyse two data-sets, from cystic fibrosis and nephrology research, that were previously analysed using Gaussian linear mixed effects models. △ Less

Submitted 7 April, 2018; originally announced April 2018.

arXiv:1802.06359 [pdf, other]

Geostatistical methods for disease map** and visualization using data from spatio-temporally referenced prevalence surveys

Authors: Emanuele Giorgi, Peter J. Diggle, Robert W. Snow, Abdisalan M. Noor

Abstract: In this paper we set out general principles and develop geostatistical methods for the analysis of data from spatio-temporally referenced prevalence surveys. Our objective is to provide a tutorial guide that can be used in order to identify parsimonious geostatistical models for prevalence map**. A general variogram-based Monte Carlo procedure is proposed to check the validity of the modelling a… ▽ More In this paper we set out general principles and develop geostatistical methods for the analysis of data from spatio-temporally referenced prevalence surveys. Our objective is to provide a tutorial guide that can be used in order to identify parsimonious geostatistical models for prevalence map**. A general variogram-based Monte Carlo procedure is proposed to check the validity of the modelling assumptions. We describe and contrast likelihood-based and Bayesian methods of inference, showing how to account for parameter uncertainty under each of the two paradigms. We also describe extensions of the standard model for disease prevalence that can be used when stationarity of the spatio-temporal covariance function is not supported by the data. We discuss how to define predictive targets and argue that exceedance probabilities provide one of the most effective ways to convey uncertainty in prevalence estimates. We describe statistical software for the visualization of spatio-temporal predictive summaries of prevalence through interactive animations. Finally, we illustrate an application to historical malaria prevalence data from 1334 surveys conducted in Senegal between 1905 and 2014. △ Less

Submitted 18 February, 2018; originally announced February 2018.

Comments: Extended version of the paper in press on International Statistical Review

arXiv:1711.10262 [pdf, other]

doi 10.1098/rsbm.2017.0039

Julian Ernst Besag, 26 March 1945 -- 6 August 2010, a biographical memoir

Authors: Peter J. Diggle, Peter J. Green, Bernard W. Silverman

Abstract: Julian Besag was an outstanding statistical scientist, distinguished for his pioneering work on the statistical theory and analysis of spatial processes, especially conditional lattice systems. His work has been seminal in statistical developments over the last several decades ranging from image analysis to Markov chain Monte Carlo methods. He clarified the role of auto-logistic and auto-normal mo… ▽ More Julian Besag was an outstanding statistical scientist, distinguished for his pioneering work on the statistical theory and analysis of spatial processes, especially conditional lattice systems. His work has been seminal in statistical developments over the last several decades ranging from image analysis to Markov chain Monte Carlo methods. He clarified the role of auto-logistic and auto-normal models as instances of Markov random fields and paved the way for their use in diverse applications. Later work included investigations into the efficacy of nearest neighbour models to accommodate spatial dependence in the analysis of data from agricultural field trials, image restoration from noisy data, and texture generation using lattice models. △ Less

Submitted 2 January, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: 26 pages, 14 figures; minor revisions, omission of full bibliography

Journal ref: Biogr. Mems Fell. R. Soc. 64, 27-50, 2018

arXiv:1711.00437 [pdf, other]

Geostatistical inference in the presence of geomasking: a composite-likelihood approach

Authors: Claudio Fronterrè, Emanuele Giorgi, Peter J. Diggle

Abstract: In almost any geostatistical analysis, one of the underlying, often implicit, modelling assump- tions is that the spatial locations, where measurements are taken, are recorded without error. In this study we develop geostatistical inference when this assumption is not valid. This is often the case when, for example, individual address information is randomly altered to provide pri- vacy protection… ▽ More In almost any geostatistical analysis, one of the underlying, often implicit, modelling assump- tions is that the spatial locations, where measurements are taken, are recorded without error. In this study we develop geostatistical inference when this assumption is not valid. This is often the case when, for example, individual address information is randomly altered to provide pri- vacy protection or imprecisions are induced by geocoding processes and measurement devices. Our objective is to develop a method of inference based on the composite likelihood that over- comes the inherent computational limits of the full likelihood method as set out in Fanshawe and Diggle (2011). Through a simulation study, we then compare the performance of our proposed approach with an N-weighted least squares estimation procedure, based on a corrected version of the empirical variogram. Our results indicate that the composite-likelihood approach outper- forms the latter, leading to smaller root-mean-square-errors in the parameter estimates. Finally, we illustrate an application of our method to analyse data on malnutrition from a Demographic and Health Survey conducted in Senegal in 2011, where locations were randomly perturbed to protect the privacy of respondents. △ Less

Submitted 1 November, 2017; originally announced November 2017.

arXiv:1605.00104 [pdf, other]

Inhibitory geostatistical designs for spatial prediction taking account of uncertain covariance structure

Authors: Michael G. Chipeta, Dianne J. Terlouw, Kamija S. Phiri, Peter J. Diggle

Abstract: The problem of choosing spatial sampling designs for investigating unobserved spatial phenomenon S arises in many contexts, for example in identifying households to select for a prevalence survey to study disease burden and heterogeneity in a study region D. We studied randomised inhibitory spatial sampling designs to address the problem of spatial prediction whilst taking account of the need to e… ▽ More The problem of choosing spatial sampling designs for investigating unobserved spatial phenomenon S arises in many contexts, for example in identifying households to select for a prevalence survey to study disease burden and heterogeneity in a study region D. We studied randomised inhibitory spatial sampling designs to address the problem of spatial prediction whilst taking account of the need to estimate covariance structure. Two specific classes of design are inhibitory designs and inhibitory designs plus close pairs. In an inhibitory design, any pair of sample locations must be separated by at least an inhibition distance {$δ$}. In an inhibitory plus close pairs design, n - k sample locations in an inhibitory design with inhibition distance {$δ$} are augmented by k locations each positioned close to one of the randomly selected n - k locations in the inhibitory design, uniformly distributed within a disc of radius {$ζ$}. We present simulation results for the Matern class of covariance structures. When the nugget variance is non-negligible, inhibitory plus close pairs designs demonstrate improved predictive efficiency over designs without close pairs. We illustrate how these findings can be applied to the design of a rolling Malaria Indicator Survey that forms part of an ongoing large-scale, five-year malaria transmission reduction project in Malawi. △ Less

Submitted 30 April, 2016; originally announced May 2016.

Comments: Submitted

arXiv:1509.04817 [pdf, ps, other]

doi 10.1214/15-BA944A

Comment on Article by Ferreira and Gamerman

Authors: Michael Chipeta, Peter J. Diggle

Abstract: Comment on Article by Ferreira and Gamerman [arXiv:1509.03410]. Comment on Article by Ferreira and Gamerman [arXiv:1509.03410]. △ Less

Submitted 16 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/15-BA944A in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

Report number: VTeX-BA-BA944A

Journal ref: Bayesian Analysis 2015, Vol. 10, No. 3, 737-739

arXiv:1509.04448 [pdf, other]

Adaptive Geostatistical Design and Analysis for Sequential Prevalence Surveys

Authors: Michael G. Chipeta, Dianne J. Terlouw, Kamija Phiri, Peter J. Diggle

Abstract: Non-adaptive geostatistical designs (NAGD) offer standard ways of collecting and analysing geostatistical data in which sampling locations are fixed in advance of any data collection. In contrast, adaptive geostatistical designs (AGD) allow collection of exposure and outcome data over time to depend on information obtained from previous information to optimise data collection towards the analysis… ▽ More Non-adaptive geostatistical designs (NAGD) offer standard ways of collecting and analysing geostatistical data in which sampling locations are fixed in advance of any data collection. In contrast, adaptive geostatistical designs (AGD) allow collection of exposure and outcome data over time to depend on information obtained from previous information to optimise data collection towards the analysis objective. AGDs are becoming more important in spatial map**, particularly in poor resource settings where uniformly precise map** may be unrealistically costly and priority is often to identify critical areas where interventions can have the most health impact. Two constructions are: $singleton$ and $batch$ adaptive sampling. In singleton sampling, locations $x_i$ are chosen sequentially and at each stage, $x_{k+1}$ depends on data obtained at locations $x_1,\ldots , x_k$. In batch sampling, locations are chosen in batches of size $b > 1$, allowing new batch, $\{x_{(k+1)},\ldots ,x_{(k+b)}\}$, to depend on data obtained at locations $x_1,\ldots, x_{kb}$. In most settings, batch sampling is more realistic than singleton sampling. We propose specific batch AGDs and assess their efficiency relative to their singleton adaptive and non-adaptive counterparts by using simulations. We show how we apply these findings to inform an AGD of a rolling Malaria Indicator Survey, part of a large-scale, five-year malaria transmission reduction project in Malawi. △ Less

Submitted 15 September, 2015; originally announced September 2015.

Comments: 18 pages, 4 figures

arXiv:1505.06891 [pdf, other]

Model-Based Geostatistics for Prevalence Map** in Low-Resource Settings

Authors: Peter J. Diggle, Emanuele Giorgi

Abstract: In low-resource settings, prevalence map** relies on empirical prevalence data from a finite, often spatially sparse, set of surveys of communities within the region of interest, possibly supplemented by remotely sensed images that can act as proxies for environmental risk factors. A standard geostatistical model for data of this kind is a generalized linear mixed model with binomial error distr… ▽ More In low-resource settings, prevalence map** relies on empirical prevalence data from a finite, often spatially sparse, set of surveys of communities within the region of interest, possibly supplemented by remotely sensed images that can act as proxies for environmental risk factors. A standard geostatistical model for data of this kind is a generalized linear mixed model with binomial error distribution, logistic link and a combination of explanatory variables and a Gaussian spatial stochastic process in the linear predictor. In this paper, we first review statistical methods and software associated with this standard model, then consider several methodological extensions whose development has been motivated by the requirements of specific applications. These include: methods for combining randomised survey data with data from non-randomised, and therefore potentially biased, surveys; spatio-temporal extensions; spatially structured zero-inflation. Throughout, we illustrate the methods with disease map** applications that have arisen through our involvement with a range of African public health programmes. △ Less

Submitted 26 May, 2015; originally announced May 2015.

Comments: Submitted

arXiv:1409.3408 [pdf, other]

On The Inverse Geostatistical Problem of Inference on Missing Locations

Authors: Emanuele Giorgi, Peter J. Diggle

Abstract: The standard geostatistical problem is to predict the values of a spatially continuous phenomenon, $S(x)$ say, at locations $x$ using data $(y_i,x_i):i=1,..,n$ where $y_i$ is the realization at location $x_i$ of $S(x_i)$, or of a random variable $Y_i$ that is stochastically related to $S(x_i)$. In this paper we address the inverse problem of predicting the locations of observed measurements $y$. W… ▽ More The standard geostatistical problem is to predict the values of a spatially continuous phenomenon, $S(x)$ say, at locations $x$ using data $(y_i,x_i):i=1,..,n$ where $y_i$ is the realization at location $x_i$ of $S(x_i)$, or of a random variable $Y_i$ that is stochastically related to $S(x_i)$. In this paper we address the inverse problem of predicting the locations of observed measurements $y$. We discuss how knowledge of the sampling mechanism can and should inform a prior specification, $π(x)$ say, for the joint distribution of the measurement locations $X = \{x_i: i=1,...,n\}$, and propose an efficient Metropolis-Hastings algorithm for drawing samples from the resulting predictive distribution of the missing elements of $X$. An important feature in many applied settings is that this predictive distribution is multi-modal, which severely limits the usefulness of simple summary measures such as the mean or median. We present two simulated examples to demonstrate the importance of the specification for $π(x)$, and analyze rainfall data from Paraná State, Brazil to show how, under additional assumptions, an empirical of estimate of $π(x)$ can be used when no prior information on the sampling design is available. △ Less

Submitted 11 September, 2014; originally announced September 2014.

Comments: Under review

arXiv:1312.6536 [pdf, ps, other]

doi 10.1214/13-STS441

Spatial and Spatio-Temporal Log-Gaussian Cox Processes: Extending the Geostatistical Paradigm

Authors: Peter J. Diggle, Paula Moraga, Barry Rowlingson, Benjamin M. Taylor

Abstract: In this paper we first describe the class of log-Gaussian Cox processes (LGCPs) as models for spatial and spatio-temporal point process data. We discuss inference, with a particular focus on the computational challenges of likelihood-based inference. We then demonstrate the usefulness of the LGCP by describing four applications: estimating the intensity surface of a spatial point process; investig… ▽ More In this paper we first describe the class of log-Gaussian Cox processes (LGCPs) as models for spatial and spatio-temporal point process data. We discuss inference, with a particular focus on the computational challenges of likelihood-based inference. We then demonstrate the usefulness of the LGCP by describing four applications: estimating the intensity surface of a spatial point process; investigating spatial segregation in a multi-type process; constructing spatially continuous maps of disease risk from spatially discrete data; and real-time health surveillance. We argue that problems of this kind fit naturally into the realm of geostatistics, which traditionally is defined as the study of spatially continuous processes using spatially discrete observations at a finite number of locations. We suggest that a more useful definition of geostatistics is by the class of scientific problems that it addresses, rather than by particular models or data formats. △ Less

Submitted 23 December, 2013; originally announced December 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-STS441 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS441

Journal ref: Statistical Science 2013, Vol. 28, No. 4, 542-563

arXiv:1308.2790 [pdf, other]

Combining data from multiple spatially referenced prevalence surveys using generalized linear geostatistical models

Authors: Emanuele Giorgi, Sanie S. S. Sesay, Dianne J. Terlouw, Peter J. Diggle

Abstract: Data from multiple prevalence surveys can provide information on common parameters of interest, which can therefore be estimated more precisely in a joint analysis than by separate analyses of the data from each survey. However, fitting a single model to the combined data from multiple surveys is inadvisable without testing the implicit assumption that all of the surveys are directed at the same i… ▽ More Data from multiple prevalence surveys can provide information on common parameters of interest, which can therefore be estimated more precisely in a joint analysis than by separate analyses of the data from each survey. However, fitting a single model to the combined data from multiple surveys is inadvisable without testing the implicit assumption that all of the surveys are directed at the same inferential target. In this paper we propose a multivariate generalized linear geostatistical model that accommodates two sources of heterogeneity across surveys so as to correct for spatially structured bias in non-randomised surveys and to allow for temporal variation in the underlying prevalence surface between consecutive survey-periods. We describe a Monte Carlo maximum likelihood procedure for parameter estimation, and show through simulation experiments how accounting for the different sources of heterogeneity among surveys in a joint model leads to more precise inferences. We describe an application to multiple surveys of malaria prevalence conducted in Chikhwawa District, Southern Malawi, and discuss how this approach could inform hybrid sampling strategies that combine data from randomised and non-randomised surveys so as to make the most efficient use of all available data. △ Less

Submitted 20 December, 2013; v1 submitted 13 August, 2013; originally announced August 2013.

Comments: Edited version after a first revision

arXiv:1202.1738 [pdf, other]

INLA or MCMC? A Tutorial and Comparative Evaluation for Spatial Prediction in log-Gaussian Cox Processes

Authors: Benjamin M. Taylor, Peter J. Diggle

Abstract: We investigate two options for performing Bayesian inference on spatial log-Gaussian Cox processes assuming a spatially continuous latent field: Markov chain Monte Carlo (MCMC) and the integrated nested Laplace approximation (INLA). We first describe the device of approximating a spatially continuous Gaussian field by a Gaussian Markov random field on a discrete lattice, and present a simulation s… ▽ More We investigate two options for performing Bayesian inference on spatial log-Gaussian Cox processes assuming a spatially continuous latent field: Markov chain Monte Carlo (MCMC) and the integrated nested Laplace approximation (INLA). We first describe the device of approximating a spatially continuous Gaussian field by a Gaussian Markov random field on a discrete lattice, and present a simulation study showing that, with careful choice of parameter values, small neighbourhood sizes can give excellent approximations. We then introduce the spatial log-Gaussian Cox process and describe MCMC and INLA methods for spatial prediction within this model class. We report the results of a simulation study in which we compare MALA and the technique of approximating the continuous latent field by a discrete one, followed by approximate Bayesian inference via INLA over a selection of 18 simulated scenarios. The results question the notion that the latter technique is both significantly faster and more robust than MCMC in this setting; 100,000 iterations of the MALA algorithm running in 20 minutes on a desktop PC delivered greater predictive accuracy than the default \verb=INLA= strategy, which ran in 4 minutes and gave comparative performance to the full Laplace approximation which ran in 39 minutes. △ Less

Submitted 19 March, 2012; v1 submitted 8 February, 2012; originally announced February 2012.

Comments: This replaces the previous version of the report. The new version includes results from an additional simulation study, and corrects an error in the implementation of the INLA-based methods

arXiv:1110.6054 [pdf, other]

lgcp An R Package for Inference with Spatio-Temporal Log-Gaussian Cox Processes

Authors: Benjamin M. Taylor, Tilman M. Davies, Barry S. Rowlingson, Peter J. Diggle

Abstract: This paper introduces an R package for spatio-temporal prediction and forecasting for log-Gaussian Cox processes. The main computational tool for these models is Markov chain Monte Carlo and the new package, lgcp, therefore also provides an extensible suite of functions for implementing MCMC algorithms for processes of this type. The modelling framework and details of inferential procedures are fi… ▽ More This paper introduces an R package for spatio-temporal prediction and forecasting for log-Gaussian Cox processes. The main computational tool for these models is Markov chain Monte Carlo and the new package, lgcp, therefore also provides an extensible suite of functions for implementing MCMC algorithms for processes of this type. The modelling framework and details of inferential procedures are first presented before a tour of lgcp functionality is given via a walk-through data-analysis. Topics covered include reading in and converting data, estimation of the key components and parameters of the model, specifying output and simulation quantities, computation of Monte Carlo expectations, post-processing and simulation of data sets. △ Less

Submitted 27 October, 2011; originally announced October 2011.

Showing 1–16 of 16 results for author: Diggle, P J