-
Predicting Malaria Incidence Using Artifical Neural Networks and Disaggregation Regression
Authors:
Jack A. Hall,
Tim C. D. Lucas
Abstract:
Disaggregation modelling is a method of predicting disease risk at high resolution using aggregated response data. High resolution disease map** is an important public health tool to aid the optimisation of resources, and is commonly used in assisting responses to diseases such as malaria. Current disaggregation regression methods are slow, inflexible, and do not easily allow non-linear terms.…
▽ More
Disaggregation modelling is a method of predicting disease risk at high resolution using aggregated response data. High resolution disease map** is an important public health tool to aid the optimisation of resources, and is commonly used in assisting responses to diseases such as malaria. Current disaggregation regression methods are slow, inflexible, and do not easily allow non-linear terms.
Neural networks may offer a solution to the limitations of current disaggregation methods. This project aimed to design a neural network which mimics the behaviour of disaggregation, then benchmark it against current methods for accuracy, flexibility and speed.
Cross-validation and nested cross-validation tested neural networks against traditional disaggregation for accuracy and execution speed was measured.
Neural networks did not improve on the accuracy of current disaggregation methods, although did see an improvement in execution time. The neural network models are more flexible and offer potential for further improvements on all metrics. The R package 'Kedis' (Keras-Disaggregation) is introduced as a user-friendly method of implementing neural network disaggregation models.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
A simulation study of disaggregation regression for spatial disease map**
Authors:
Rohan Arambepola,
Tim C D Lucas,
Anita K Nandi,
Peter W Gething,
Ewan Cameron
Abstract:
Disaggregation regression has become an important tool in spatial disease map** for making fine-scale predictions of disease risk from aggregated response data. By including high resolution covariate information and modelling the data generating process on a fine scale, it is hoped that these models can accurately learn the relationships between covariates and response at a fine spatial scale. H…
▽ More
Disaggregation regression has become an important tool in spatial disease map** for making fine-scale predictions of disease risk from aggregated response data. By including high resolution covariate information and modelling the data generating process on a fine scale, it is hoped that these models can accurately learn the relationships between covariates and response at a fine spatial scale. However, validating these high resolution predictions can be a challenge, as often there is no data observed at this spatial scale. In this study, disaggregation regression was performed on simulated data in various settings and the resulting fine-scale predictions are compared to the simulated ground truth. Performance was investigated with varying numbers of data points, sizes of aggregated areas and levels of model misspecification. The effectiveness of cross validation on the aggregate level as a measure of fine-scale predictive performance was also investigated. Predictive performance improved as the number of observations increased and as the size of the aggregated areas decreased. When the model was well-specified, fine-scale predictions were accurate even with small numbers of observations and large aggregated areas. Under model misspecification predictive performance was significantly worse for large aggregated areas but remained high when response data was aggregated over smaller regions. Cross-validation correlation on the aggregate level was a moderately good predictor of fine-scale predictive performance. While the simulations are unlikely to capture the nuances of real-life response data, this study gives insight into the effectiveness of disaggregation regression in different contexts.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Graphical outputs and Spatial Cross-validation for the R-INLA package using INLAutils
Authors:
Tim Lucas,
Andre Python,
David Redding
Abstract:
Statistical analyses proceed by an iterative process of model fitting and checking. The R-INLA package facilitates this iteration by fitting many Bayesian models much faster than alternative MCMC approaches. As the interpretation of results and model objects from Bayesian analyses can be complex, the R package INLAutils provides users with easily accessible, clear and customisable graphical summar…
▽ More
Statistical analyses proceed by an iterative process of model fitting and checking. The R-INLA package facilitates this iteration by fitting many Bayesian models much faster than alternative MCMC approaches. As the interpretation of results and model objects from Bayesian analyses can be complex, the R package INLAutils provides users with easily accessible, clear and customisable graphical summaries of model outputs from R- INLA. Furthermore, it offers a function for performing and visualizing the results of a spatial leave-one-out cross-validation (SLOOCV) approach that can be applied to compare the predictive performance of multiple spatial models. In this paper, we describe and illustrate the use of (1) graphical summary plotting functions and (2) the SLOOCV approach. We conclude the paper by identifying the limits of our approach and discuss future potential improvements.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
disaggregation: An R Package for Bayesian Spatial Disaggregation Modelling
Authors:
Anita K. Nandi,
Tim C. D. Lucas,
Rohan Arambepola,
Peter Gething,
Daniel J. Weiss
Abstract:
Disaggregation modelling, or downscaling, has become an important discipline in epidemiology. Surveillance data, aggregated over large regions, is becoming more common, leading to an increasing demand for modelling frameworks that can deal with this data to understand spatial patterns. Disaggregation regression models use response data aggregated over large heterogenous regions to make predictions…
▽ More
Disaggregation modelling, or downscaling, has become an important discipline in epidemiology. Surveillance data, aggregated over large regions, is becoming more common, leading to an increasing demand for modelling frameworks that can deal with this data to understand spatial patterns. Disaggregation regression models use response data aggregated over large heterogenous regions to make predictions at fine-scale over the region by using fine-scale covariates to inform the heterogeneity. This paper presents the R package disaggregation, which provides functionality to streamline the process of running a disaggregation model for fine-scale predictions.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Map** malaria seasonality: a case study from Madagascar
Authors:
Michele Nguyen,
Rosalind E. Howes,
Tim C. D. Lucas,
Katherine E. Battle,
Ewan Cameron,
Harry S. Gibson,
Jennifer Rozier,
Suzanne Keddie,
Emma Collins,
Rohan Arambepola,
Su Yun Kang,
Chantal Hendriks,
Anita Nandi,
Susan F. Rumisha,
Samir Bhatt,
Sedera A. Mioramalala,
Mauricette Andriamananjara Nambinisoa,
Fanjasoa Rakotomanana,
Peter W. Gething,
Daniel J. Weiss
Abstract:
Many malaria-endemic areas experience seasonal fluctuations in case incidence as Anopheles mosquito and Plasmodium parasite life cycles respond to changing environmental conditions. While most existing maps of malaria seasonality use fixed thresholds of rainfall, temperature, and/or vegetation indices to identify suitable transmission months, we develop a statistical modelling framework for charac…
▽ More
Many malaria-endemic areas experience seasonal fluctuations in case incidence as Anopheles mosquito and Plasmodium parasite life cycles respond to changing environmental conditions. While most existing maps of malaria seasonality use fixed thresholds of rainfall, temperature, and/or vegetation indices to identify suitable transmission months, we develop a statistical modelling framework for characterising the seasonal patterns derived directly from case data.
The procedure involves a spatiotemporal regression model for estimating the monthly proportions of total annual cases and an algorithm to identify operationally relevant characteristics such as the transmission start and peak months. A seasonality index combines the monthly proportion estimates and existing estimates of annual case incidence to provide a summary of "how seasonal" locations are relative to their surroundings. An advancement upon past seasonality map** endeavours is the presentation of the uncertainty associated with each map, which will enable policymakers to make more statistically sound decisions. The methodology is illustrated using health facility data from Madagascar.
△ Less
Submitted 17 May, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.
-
Mixed batches and symmetric discriminators for GAN training
Authors:
Thomas Lucas,
Corentin Tallec,
Jakob Verbeek,
Yann Ollivier
Abstract:
Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode drop**: the generator models only part of the tar…
▽ More
Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode drop**: the generator models only part of the target distribution. We propose to feed the discriminator with mixed batches of true and fake samples, and train it to predict the ratio of true samples in the batch. The latter score does not depend on the order of samples in a batch. Rather than learning this invariance, we introduce a generic permutation-invariant discriminator ar- chitecture. This architecture is provably a uni- versal approximator of all symmetric functions. Experimentally, our approach reduces mode col- lapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets, both qualitatively and quantitatively.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Variational Learning on Aggregate Outputs with Gaussian Processes
Authors:
Ho Chung Leon Law,
Dino Sejdinovic,
Ewan Cameron,
Tim CD Lucas,
Seth Flaxman,
Katherine Battle,
Kenji Fukumizu
Abstract:
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on varia…
▽ More
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.