-
Fusing Low-Latency Data Feeds with Death Data to Accurately Nowcast COVID-19 Related Deaths
Authors:
Conor Rosato,
Robert E. Moore,
Matthew Carter,
John Heap,
Jose Storopoli,
Simon Maskell
Abstract:
The emergence of the novel coronavirus (COVID-19) has generated a need to quickly and accurately assemble up-to-date information related to its spread. While it is possible to use deaths to provide a reliable information feed, the latency of data derived from deaths is significant. Confirmed cases derived from positive test results potentially provide a lower latency data feed. However, the sampli…
▽ More
The emergence of the novel coronavirus (COVID-19) has generated a need to quickly and accurately assemble up-to-date information related to its spread. While it is possible to use deaths to provide a reliable information feed, the latency of data derived from deaths is significant. Confirmed cases derived from positive test results potentially provide a lower latency data feed. However, the sampling of those tested varies with time and the reason for testing is often not recorded. Hospital admissions typically occur around 1-2 weeks after infection and can be considered out of date in relation to the time of initial infection. The extent to which these issues are problematic is likely to vary over time and between countries.
We use a machine learning algorithm for natural language processing, trained in multiple languages, to identify symptomatic individuals derived from social media and, in particular Twitter, in real-time. We then use an extended SEIRD epidemiological model to fuse combinations of low-latency feeds, including the symptomatic counts from Twitter, with death data to estimate parameters of the model and nowcast the number of people in each compartment. The model is implemented in the probabilistic programming language Stan and uses a bespoke numerical integrator. We present results showing that using specific low-latency data feeds along with death data provides more consistent and accurate forecasts of COVID-19 related deaths than using death data alone.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Refining Epidemiological Forecasts with Simple Scoring Rules
Authors:
R. E. Moore,
C. Rosato,
S. Maskell
Abstract:
Estimates from infectious disease models have constituted a significant part of the scientific evidence used to inform the response to the COVID-19 pandemic in the UK. These estimates can vary strikingly in their bias and variability. Epidemiological forecasts should be consistent with the observations that eventually materialise. We use simple scoring rules to refine the forecasts of a novel stat…
▽ More
Estimates from infectious disease models have constituted a significant part of the scientific evidence used to inform the response to the COVID-19 pandemic in the UK. These estimates can vary strikingly in their bias and variability. Epidemiological forecasts should be consistent with the observations that eventually materialise. We use simple scoring rules to refine the forecasts of a novel statistical model for multisource COVID-19 surveillance data by tuning its smoothness hyperparameter.
△ Less
Submitted 14 March, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Increasing the efficiency of Sequential Monte Carlo samplers through the use of approximately optimal L-kernels
Authors:
Peter L Green,
Robert E Moore,
Ryan J Jackson,
**glai Li,
Simon Maskell
Abstract:
By facilitating the generation of samples from arbitrary probability distributions, Markov Chain Monte Carlo (MCMC) is, arguably, \emph{the} tool for the evaluation of Bayesian inference problems that yield non-standard posterior distributions. In recent years, however, it has become apparent that Sequential Monte Carlo (SMC) samplers have the potential to outperform MCMC in a number of ways. SMC…
▽ More
By facilitating the generation of samples from arbitrary probability distributions, Markov Chain Monte Carlo (MCMC) is, arguably, \emph{the} tool for the evaluation of Bayesian inference problems that yield non-standard posterior distributions. In recent years, however, it has become apparent that Sequential Monte Carlo (SMC) samplers have the potential to outperform MCMC in a number of ways. SMC samplers are better suited to highly parallel computing architectures and also feature various tuning parameters that are not available to MCMC. One such parameter - the `L-kernel' - is a user-defined probability distribution that can be used to influence the efficiency of the sampler. In the current paper, the authors explain how to derive an expression for the L-kernel that minimises the variance of the estimates realised by an SMC sampler. Various approximation methods are then proposed to aid implementation of the proposed L-kernel. The improved performance of the resulting algorithm is demonstrated in multiple scenarios. For the examples shown in the current paper, the use of an approximately optimum L-kernel has reduced the variance of the SMC estimates by up to 99 % while also reducing the number of times that resampling was required by between 65 % and 70 %. Python code and code tests accompanying this manuscript are available through the Github repository \url{https://github.com/plgreenLIRU/SMC_approx_optL}.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.