Search | arXiv e-print repository

Predicting Malaria Incidence Using Artifical Neural Networks and Disaggregation Regression

Abstract: Disaggregation modelling is a method of predicting disease risk at high resolution using aggregated response data. High resolution disease map** is an important public health tool to aid the optimisation of resources, and is commonly used in assisting responses to diseases such as malaria. Current disaggregation regression methods are slow, inflexible, and do not easily allow non-linear terms.… ▽ More Disaggregation modelling is a method of predicting disease risk at high resolution using aggregated response data. High resolution disease map** is an important public health tool to aid the optimisation of resources, and is commonly used in assisting responses to diseases such as malaria. Current disaggregation regression methods are slow, inflexible, and do not easily allow non-linear terms. Neural networks may offer a solution to the limitations of current disaggregation methods. This project aimed to design a neural network which mimics the behaviour of disaggregation, then benchmark it against current methods for accuracy, flexibility and speed. Cross-validation and nested cross-validation tested neural networks against traditional disaggregation for accuracy and execution speed was measured. Neural networks did not improve on the accuracy of current disaggregation methods, although did see an improvement in execution time. The neural network models are more flexible and offer potential for further improvements on all metrics. The R package 'Kedis' (Keras-Disaggregation) is introduced as a user-friendly method of implementing neural network disaggregation models. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2005.03604 [pdf, other]

A simulation study of disaggregation regression for spatial disease map**

Authors: Rohan Arambepola, Tim C D Lucas, Anita K Nandi, Peter W Gething, Ewan Cameron

Abstract: Disaggregation regression has become an important tool in spatial disease map** for making fine-scale predictions of disease risk from aggregated response data. By including high resolution covariate information and modelling the data generating process on a fine scale, it is hoped that these models can accurately learn the relationships between covariates and response at a fine spatial scale. H… ▽ More Disaggregation regression has become an important tool in spatial disease map** for making fine-scale predictions of disease risk from aggregated response data. By including high resolution covariate information and modelling the data generating process on a fine scale, it is hoped that these models can accurately learn the relationships between covariates and response at a fine spatial scale. However, validating these high resolution predictions can be a challenge, as often there is no data observed at this spatial scale. In this study, disaggregation regression was performed on simulated data in various settings and the resulting fine-scale predictions are compared to the simulated ground truth. Performance was investigated with varying numbers of data points, sizes of aggregated areas and levels of model misspecification. The effectiveness of cross validation on the aggregate level as a measure of fine-scale predictive performance was also investigated. Predictive performance improved as the number of observations increased and as the size of the aggregated areas decreased. When the model was well-specified, fine-scale predictions were accurate even with small numbers of observations and large aggregated areas. Under model misspecification predictive performance was significantly worse for large aggregated areas but remained high when response data was aggregated over smaller regions. Cross-validation correlation on the aggregate level was a moderately good predictor of fine-scale predictive performance. While the simulations are unlikely to capture the nuances of real-life response data, this study gives insight into the effectiveness of disaggregation regression in different contexts. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2004.02324 [pdf, other]

Graphical outputs and Spatial Cross-validation for the R-INLA package using INLAutils

Authors: Tim Lucas, Andre Python, David Redding

Abstract: Statistical analyses proceed by an iterative process of model fitting and checking. The R-INLA package facilitates this iteration by fitting many Bayesian models much faster than alternative MCMC approaches. As the interpretation of results and model objects from Bayesian analyses can be complex, the R package INLAutils provides users with easily accessible, clear and customisable graphical summar… ▽ More Statistical analyses proceed by an iterative process of model fitting and checking. The R-INLA package facilitates this iteration by fitting many Bayesian models much faster than alternative MCMC approaches. As the interpretation of results and model objects from Bayesian analyses can be complex, the R package INLAutils provides users with easily accessible, clear and customisable graphical summaries of model outputs from R- INLA. Furthermore, it offers a function for performing and visualizing the results of a spatial leave-one-out cross-validation (SLOOCV) approach that can be applied to compare the predictive performance of multiple spatial models. In this paper, we describe and illustrate the use of (1) graphical summary plotting functions and (2) the SLOOCV approach. We conclude the paper by identifying the limits of our approach and discuss future potential improvements. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: 13 pages

arXiv:2001.04847 [pdf, other]

disaggregation: An R Package for Bayesian Spatial Disaggregation Modelling

Authors: Anita K. Nandi, Tim C. D. Lucas, Rohan Arambepola, Peter Gething, Daniel J. Weiss

Abstract: Disaggregation modelling, or downscaling, has become an important discipline in epidemiology. Surveillance data, aggregated over large regions, is becoming more common, leading to an increasing demand for modelling frameworks that can deal with this data to understand spatial patterns. Disaggregation regression models use response data aggregated over large heterogenous regions to make predictions… ▽ More Disaggregation modelling, or downscaling, has become an important discipline in epidemiology. Surveillance data, aggregated over large regions, is becoming more common, leading to an increasing demand for modelling frameworks that can deal with this data to understand spatial patterns. Disaggregation regression models use response data aggregated over large heterogenous regions to make predictions at fine-scale over the region by using fine-scale covariates to inform the heterogeneity. This paper presents the R package disaggregation, which provides functionality to streamline the process of running a disaggregation model for fine-scale predictions. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 16 pages, 5 figures, submitted to Journal of Statistical Software

arXiv:1901.10782 [pdf, other]

doi 10.1186/s12916-019-1486-3

Map** malaria seasonality: a case study from Madagascar

Authors: Michele Nguyen, Rosalind E. Howes, Tim C. D. Lucas, Katherine E. Battle, Ewan Cameron, Harry S. Gibson, Jennifer Rozier, Suzanne Keddie, Emma Collins, Rohan Arambepola, Su Yun Kang, Chantal Hendriks, Anita Nandi, Susan F. Rumisha, Samir Bhatt, Sedera A. Mioramalala, Mauricette Andriamananjara Nambinisoa, Fanjasoa Rakotomanana, Peter W. Gething, Daniel J. Weiss

Abstract: Many malaria-endemic areas experience seasonal fluctuations in case incidence as Anopheles mosquito and Plasmodium parasite life cycles respond to changing environmental conditions. While most existing maps of malaria seasonality use fixed thresholds of rainfall, temperature, and/or vegetation indices to identify suitable transmission months, we develop a statistical modelling framework for charac… ▽ More Many malaria-endemic areas experience seasonal fluctuations in case incidence as Anopheles mosquito and Plasmodium parasite life cycles respond to changing environmental conditions. While most existing maps of malaria seasonality use fixed thresholds of rainfall, temperature, and/or vegetation indices to identify suitable transmission months, we develop a statistical modelling framework for characterising the seasonal patterns derived directly from case data. The procedure involves a spatiotemporal regression model for estimating the monthly proportions of total annual cases and an algorithm to identify operationally relevant characteristics such as the transmission start and peak months. A seasonality index combines the monthly proportion estimates and existing estimates of annual case incidence to provide a summary of "how seasonal" locations are relative to their surroundings. An advancement upon past seasonality map** endeavours is the presentation of the uncertainty associated with each map, which will enable policymakers to make more statistically sound decisions. The methodology is illustrated using health facility data from Madagascar. △ Less

Submitted 17 May, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

Journal ref: BMC Med 18, 26 (2020)

arXiv:1806.07185 [pdf, other]

Mixed batches and symmetric discriminators for GAN training

Authors: Thomas Lucas, Corentin Tallec, Jakob Verbeek, Yann Ollivier

Abstract: Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode drop**: the generator models only part of the tar… ▽ More Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode drop**: the generator models only part of the target distribution. We propose to feed the discriminator with mixed batches of true and fake samples, and train it to predict the ratio of true samples in the batch. The latter score does not depend on the order of samples in a batch. Rather than learning this invariance, we introduce a generic permutation-invariant discriminator ar- chitecture. This architecture is provably a uni- versal approximator of all symmetric functions. Experimentally, our approach reduces mode col- lapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets, both qualitatively and quantitatively. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: Accepted at ICML 2018 (long oral)

arXiv:1805.08463 [pdf, other]

Variational Learning on Aggregate Outputs with Gaussian Processes

Authors: Ho Chung Leon Law, Dino Sejdinovic, Ewan Cameron, Tim CD Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu

Abstract: While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on varia… ▽ More While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations. △ Less

Submitted 22 May, 2018; originally announced May 2018.

Showing 1–7 of 7 results for author: Lucas, T