-
Map** Incidence and Prevalence Peak Data for SIR Forecasting Applications
Authors:
Alexander C. Murph,
G. Casey Gibson,
Lauren J. Beesley,
Nishant Panda,
Lauren A. Castro,
Sara Y. Del Valle,
Dave Osthus
Abstract:
Infectious disease modeling and forecasting have played a key role in hel** assess and respond to epidemics and pandemics. Recent work has leveraged data on disease peak infection and peak hospital incidence to fit compartmental models for the purpose of forecasting and describing the dynamics of a disease outbreak. Incorporating these data can greatly stabilize a compartmental model fit on earl…
▽ More
Infectious disease modeling and forecasting have played a key role in hel** assess and respond to epidemics and pandemics. Recent work has leveraged data on disease peak infection and peak hospital incidence to fit compartmental models for the purpose of forecasting and describing the dynamics of a disease outbreak. Incorporating these data can greatly stabilize a compartmental model fit on early observations, where slight perturbations in the data may lead to model fits that project wildly unrealistic peak infection. We introduce a new method for incorporating historic data on the value and time of peak incidence of hospitalization into the fit for a Susceptible-Infectious-Recovered (SIR) model by formulating the relationship between an SIR model's starting parameters and peak incidence as a system of two equations that can be solved computationally. This approach is assessed for practicality in terms of accuracy and speed of computation via simulation. To exhibit the modeling potential, we update the Dirichlet-Beta State Space modeling framework to use hospital incidence data, as this framework was previously formulated to incorporate only data on total infections.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Gallium nitride phononic integrated circuits for future RF front-ends
Authors:
Mahmut Bicer,
Stefano Valle,
Jacob Brown,
Martin Kuball,
Krishna C. Balram
Abstract:
Achieving monolithic integration of passive acoustic wave devices, in particular RF filters, with active devices such as RF amplifiers and switches, is the optimal solution to meet the challenging communication requirements of mobile devices, especially as we move towards the 6G era. This requires a significant ($\approx$100x) reduction in the size of the RF passives, from mm$^2$ footprints in cur…
▽ More
Achieving monolithic integration of passive acoustic wave devices, in particular RF filters, with active devices such as RF amplifiers and switches, is the optimal solution to meet the challenging communication requirements of mobile devices, especially as we move towards the 6G era. This requires a significant ($\approx$100x) reduction in the size of the RF passives, from mm$^2$ footprints in current devices to tens of $μm^2$ in future systems. Applying ideas from integrated photonics, we demonstrate that high frequency (>3 GHz) sound can be efficiently guided in $μ$m-scale gallium nitride(GaN) waveguides by exploiting the strong velocity contrast available in the GaN on silicon carbide (SiC) platform. Given the established use of GaN devices in RF amplifiers, our work opens up the possibility of building monolithically integrated RF front-ends in GaN-on-SiC.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Addressing delayed case reporting in infectious disease forecast modeling
Authors:
Lauren J Beesley,
Dave Osthus,
Sara Y Del Valle
Abstract:
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden.
In this paper, we propose a general framework for addressing reporting…
▽ More
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden.
In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Impact of COVID-19 Policies and Misinformation on Social Unrest
Authors:
Martha Barnard,
Radhika Iyer,
Sara Y. Del Valle,
Ashlynn R. Daughton
Abstract:
The novel coronavirus disease (COVID-19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have b…
▽ More
The novel coronavirus disease (COVID-19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have become available, allowing us to quantitatively explore these associations. Understanding these relationships can help us better prepare for future disasters and mitigate the impacts. Here, we focus on the interplay between social unrest (protests), health outcomes, public health orders, and misinformation in eight countries of Western Europe and four regions of the United States. We created 1-3 week forecasts of both a binary protest metric for identifying times of high protest activity and the overall protest counts over time. We found that for all regions, except Belgium, at least one feature from our various data streams was predictive of protests. However, the accuracy of the protest forecasts varied by country, that is, for roughly half of the countries analyzed, our forecasts outperform a naïve model. These mixed results demonstrate the potential of diverse data streams to predict a topic as volatile as protests as well as the difficulties of predicting a situation that is as rapidly evolving as a pandemic.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Cryogenic operation of MEMS-based suspended high overtone bulk acoustic wave resonators for microwave to optical signal transduction
Authors:
Stefano Valle,
Krishna C. Balram
Abstract:
Suspended high-overtone bulk acoustic wave resonators (HBARs) can serve as a viable optomechanical platform for efficient transduction of signals from the microwave to the optical frequency domain. In contrast to 1D nanobeam optomechanical crystals, HBARs benefit from very high RF to phonon injection efficiency ($η_{PIE}\approx$1) and low optical pump induced heating at cryogenic temperatures. By…
▽ More
Suspended high-overtone bulk acoustic wave resonators (HBARs) can serve as a viable optomechanical platform for efficient transduction of signals from the microwave to the optical frequency domain. In contrast to 1D nanobeam optomechanical crystals, HBARs benefit from very high RF to phonon injection efficiency ($η_{PIE}\approx$1) and low optical pump induced heating at cryogenic temperatures. By building small mode volume optical cavities around these devices, one can in principle achieve optomechanical cooperativities comparable to 1D nanobeam optomechanical crystals. In this work, we demonstrate cryogenic operation ($\approx$10 K) of such suspended HBAR devices and show classical signal modulation upto 3.5 GHz and response times $\approx$ 524 ns (for the fundamental mode at 340 MHz). While the transduction efficiency is currently limited by the material and device fabrication processes used in this work, we show that with reasonable modifications, efficient quantum transduction is within reach using this approach.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
The FragmentatiOn Of Target Experiment (FOOT) and its DAQ system
Authors:
Silvia Biondi,
Andrey Alexandrov,
Behcet Alpat,
Giovanni Ambrosi,
Stefano Argirò,
Rau Arteche Diaz,
Nazarm Bartosik,
Giuseppe Battistoni,
Nicola Belcari,
Elettra Bellinzona,
Maria Giuseppina Bisogni,
Graziano Bruni,
Pietro Carra,
Piergiorgio Cerello,
Esther Ciarrocchi,
Alberto Clozza,
Sofia Colombi,
Giovanni De Lellis,
Alberto Del Guerra,
Micol De Simoni,
Antonia Di Crescenzo,
Benedetto Di Ruzza,
Marco Donetti,
Yunsheng Dong,
Marco Durante
, et al. (70 additional authors not shown)
Abstract:
The FragmentatiOn Of Target (FOOT) experiment aims to provide precise nuclear cross-section measurements for two different fields: hadrontherapy and radio-protection in space. The main reason is the important role the nuclear fragmentation process plays in both fields, where the health risks caused by radiation are very similar and mainly attributable to the fragmentation process. The FOOT experim…
▽ More
The FragmentatiOn Of Target (FOOT) experiment aims to provide precise nuclear cross-section measurements for two different fields: hadrontherapy and radio-protection in space. The main reason is the important role the nuclear fragmentation process plays in both fields, where the health risks caused by radiation are very similar and mainly attributable to the fragmentation process. The FOOT experiment has been developed in such a way that the experimental setup is easily movable and fits the space limitations of the experimental and treatment rooms available in hadrontherapy treatment centers, where most of the data takings are carried out. The Trigger and Data Acquisition system needs to follow the same criteria and it should work in different laboratories and in different conditions. It has been designed to acquire the largest sample size with high accuracy in a controlled and online-monitored environment. The data collected are processed in real-time for quality assessment and are available to the DAQ crew and detector experts during data taking.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Machine Learning-Powered Mitigation Policy Optimization in Epidemiological Models
Authors:
Jayaraman J. Thiagarajan,
Peer-Timo Bremer,
Rushil Anirudh,
Timothy C. Germann,
Sara Y. Del Valle,
Frederick H. Streitz
Abstract:
A crucial aspect of managing a public health crisis is to effectively balance prevention and mitigation strategies, while taking their socio-economic impact into account. In particular, determining the influence of different non-pharmaceutical interventions (NPIs) on the effective use of public resources is an important problem, given the uncertainties on when a vaccine will be made available. In…
▽ More
A crucial aspect of managing a public health crisis is to effectively balance prevention and mitigation strategies, while taking their socio-economic impact into account. In particular, determining the influence of different non-pharmaceutical interventions (NPIs) on the effective use of public resources is an important problem, given the uncertainties on when a vaccine will be made available. In this paper, we propose a new approach for obtaining optimal policy recommendations based on epidemiological models, which can characterize the disease progression under different interventions, and a look-ahead reward optimization strategy to choose the suitable NPI at different stages of an epidemic. Given the time delay inherent in any epidemiological model and the exponential nature especially of an unmanaged epidemic, we find that such a look-ahead strategy infers non-trivial policies that adhere well to the constraints specified. Using two different epidemiological models, namely SEIR and EpiCast, we evaluate the proposed algorithm to determine the optimal NPI policy, under a constraint on the number of daily new cases and the primary reward being the absence of restrictions.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Accurate Calibration of Agent-based Epidemiological Models with Neural Network Surrogates
Authors:
Rushil Anirudh,
Jayaraman J. Thiagarajan,
Peer-Timo Bremer,
Timothy C. Germann,
Sara Y. Del Valle,
Frederick H. Streitz
Abstract:
Calibrating complex epidemiological models to observed data is a crucial step to provide both insights into the current disease dynamics, i.e.\ by estimating a reproductive number, as well as to provide reliable forecasts and scenario explorations. Here we present a new approach to calibrate an agent-based model -- EpiCast -- using a large set of simulation ensembles for different major metropolit…
▽ More
Calibrating complex epidemiological models to observed data is a crucial step to provide both insights into the current disease dynamics, i.e.\ by estimating a reproductive number, as well as to provide reliable forecasts and scenario explorations. Here we present a new approach to calibrate an agent-based model -- EpiCast -- using a large set of simulation ensembles for different major metropolitan areas of the United States. In particular, we propose: a new neural network based surrogate model able to simultaneously emulate all different locations; and a novel posterior estimation that provides not only more accurate posterior estimates of all parameters but enables the joint fitting of global parameters across regions.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Time Series Methods and Ensemble Models to Nowcast Dengue at the State Level in Brazil
Authors:
Katherine Kempfert,
Kaitlyn Martinez,
Amir Siraj,
Jessica Conrad,
Geoffrey Fairchild,
Amanda Ziemann,
Nidhi Parikh,
David Osthus,
Nicholas Generous,
Sara Del Valle,
Carrie Manore
Abstract:
Predicting an infectious disease can help reduce its impact by advising public health interventions and personal preventive measures. Novel data streams, such as Internet and social media data, have recently been reported to benefit infectious disease prediction. As a case study of dengue in Brazil, we have combined multiple traditional and non-traditional, heterogeneous data streams (satellite im…
▽ More
Predicting an infectious disease can help reduce its impact by advising public health interventions and personal preventive measures. Novel data streams, such as Internet and social media data, have recently been reported to benefit infectious disease prediction. As a case study of dengue in Brazil, we have combined multiple traditional and non-traditional, heterogeneous data streams (satellite imagery, Internet, weather, and clinical surveillance data) across its 27 states on a weekly basis over seven years. For each state, we nowcast dengue based on several time series models, which vary in complexity and inclusion of exogenous data. The top-performing model varies by state, motivating our consideration of ensemble approaches to automatically combine these models for better outcomes at the state level. Model comparisons suggest that predictions often improve with the addition of exogenous data, although similar performance can be attained by including only one exogenous data stream (either weather data or the novel satellite data) rather than combining all of them. Our results demonstrate that Brazil can be nowcasted at the state level with high accuracy and confidence, inform the utility of each individual data stream, and reveal potential geographic contributors to predictive performance. Our work can be extended to other spatial levels of Brazil, vector-borne diseases, and countries, so that the spread of infectious disease can be more effectively curbed.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
High frequency guided mode resonances in mass-loaded, thin film gallium nitride surface acoustic wave devices
Authors:
Stefano Valle,
Manikant Singh,
Martin Cryan,
Martin Kuball,
Krishna C. Balram
Abstract:
We demonstrate high-frequency (> 3 GHz), high quality factor radio frequency (RF) resonators in unreleased thin film gallium nitride (GaN) on sapphire and silicon carbide substrates by exploiting acoustic guided mode (Lamb wave) resonances. The associated energy trap**, due to mass loading from the gold electrodes, allows us to efficiently excite these resonances from a 50 $Ω$ input. The higher…
▽ More
We demonstrate high-frequency (> 3 GHz), high quality factor radio frequency (RF) resonators in unreleased thin film gallium nitride (GaN) on sapphire and silicon carbide substrates by exploiting acoustic guided mode (Lamb wave) resonances. The associated energy trap**, due to mass loading from the gold electrodes, allows us to efficiently excite these resonances from a 50 $Ω$ input. The higher phase velocity, combined with lower electrode dam**, enables high quality factors with moderate electrode pitch, and provides a viable route towards high-frequency piezoelectric devices. The GaN platform, with its ability to guide and localize high-frequency sound on the surface of a chip with access to high-performance active devices, will serve as a key building block for monolithically integrated RF front-ends.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Nowcasting Influenza Incidence with CDC Web Traffic Data: A Demonstration Using a Novel Data Set
Authors:
Wendy K. Caldwell,
Geoffrey Fairchild,
Sara Y. Del Valle
Abstract:
Influenza epidemics result in a public health and economic burden around the globe. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1-2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. In this work, we present the first implementation of a novel data set b…
▽ More
Influenza epidemics result in a public health and economic burden around the globe. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1-2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. In this work, we present the first implementation of a novel data set by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions. We use Internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We test the traffic generated by ten influenza-related pages in eight states and nine census divisions within the United States and compare it against clinical surveillance data. Our results yield $r^2$ = 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. These results demonstrate that Internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. In the interest of scientific transparency to further the understanding of when Internet data streams are an appropriate supplemental data source, we also include negative results (i.e., unsuccessful models). We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
High-frequency, resonant acousto-optic modulators fabricated in a MEMS foundry platform
Authors:
Stefano Valle,
Krishna C. Balram
Abstract:
We report the design and characterization of high frequency, resonant acousto-optic modulators (AOM) in a MEMS foundry process. The doubly-resonant cavity design, with short ($L{\sim}10.5\, μm$) acoustic and optical cavity lengths, allows us to measure acousto-optic modulation at GHz frequencies with high modulation efficiency. In contrast to traditional AOMs, these devices rely on the perturbatio…
▽ More
We report the design and characterization of high frequency, resonant acousto-optic modulators (AOM) in a MEMS foundry process. The doubly-resonant cavity design, with short ($L{\sim}10.5\, μm$) acoustic and optical cavity lengths, allows us to measure acousto-optic modulation at GHz frequencies with high modulation efficiency. In contrast to traditional AOMs, these devices rely on the perturbation induced by the displacement of cavity boundaries, which can be significantly enhanced in a suspended geometry. This platform can serve as the building block for fast 2D spatial light modulators (SLM), low-cost integrated free space optical links and optically enhanced low-noise RF receivers.
△ Less
Submitted 16 October, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.
-
Dynamic Bayesian Influenza Forecasting in the United States with Hierarchical Discrepancy
Authors:
Dave Osthus,
James Gattiker,
Reid Priedhorsky,
Sara Y. Del Valle
Abstract:
Timely and accurate forecasts of seasonal influenza would assist public health decision-makers in planning intervention strategies, efficiently allocating resources, and possibly saving lives. For these reasons, influenza forecasts are consequential. Producing timely and accurate influenza forecasts, however, have proven challenging due to noisy and limited data, an incomplete understanding of the…
▽ More
Timely and accurate forecasts of seasonal influenza would assist public health decision-makers in planning intervention strategies, efficiently allocating resources, and possibly saving lives. For these reasons, influenza forecasts are consequential. Producing timely and accurate influenza forecasts, however, have proven challenging due to noisy and limited data, an incomplete understanding of the disease transmission process, and the mismatch between the disease transmission process and the data-generating process. In this paper, we introduce a dynamic Bayesian (DB) flu forecasting model that exploits model discrepancy through a hierarchical model. The DB model allows forecasts of partially observed flu seasons to borrow discrepancy information from previously observed flu seasons. We compare the DB model to all models that competed in the CDC's 2015--2016 flu forecasting challenge. The DB model outperformed all models, indicating the DB model is a leading influenza forecasting model.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Estimating the reproductive number, total outbreak size, and reporting rates for Zika epidemics in South and Central America
Authors:
Deborah P. Shutt,
Carrie A. Manore,
Stephen Pankavich,
Aaron T. Porter,
Sara Y. Del Valle
Abstract:
As South and Central American countries prepare for increased birth defects from Zika virus outbreaks and plan for mitigation strategies to minimize ongoing and future outbreaks, understanding important characteristics of Zika outbreaks and how they vary across regions is a challenging and important problem. We developed a mathematical model for the 2015 Zika virus outbreak dynamics in Colombia, E…
▽ More
As South and Central American countries prepare for increased birth defects from Zika virus outbreaks and plan for mitigation strategies to minimize ongoing and future outbreaks, understanding important characteristics of Zika outbreaks and how they vary across regions is a challenging and important problem. We developed a mathematical model for the 2015 Zika virus outbreak dynamics in Colombia, El Salvador, and Suriname. We fit the model to publicly available data provided by the Pan American Health Organization, using Approximate Bayesian Computation to estimate parameter distributions and provide uncertainty quantification. An important model input is the at-risk susceptible population, which can vary with a number of factors including climate, elevation, population density, and socio-economic status. We informed this initial condition using the highest historically reported dengue incidence modified by the probable dengue reporting rates in the chosen countries. The model indicated that a country-level analysis was not appropriate for Colombia. We then estimated the basic reproduction number, or the expected number of new human infections arising from a single infected human, to range between 4 and 6 for El Salvador and Suriname with a median of 4.3 and 5.3, respectively. We estimated the reporting rate to be around 16% in El Salvador and 18% in Suriname with estimated total outbreak sizes of 73,395 and 21,647 people, respectively. The uncertainty in parameter estimates highlights a need for research and data collection that will better constrain parameter ranges.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Eliciting Disease Data from Wikipedia Articles
Authors:
Geoffrey Fairchild,
Lalindra De Silva,
Sara Y. Del Valle,
Alberto M. Segre
Abstract:
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing…
▽ More
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content.
We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data.
We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.
△ Less
Submitted 24 August, 2015; v1 submitted 2 April, 2015;
originally announced April 2015.
-
Quantifying Uncertainty in Stochastic Models with Parametric Variability
Authors:
Kyle S. Hickmann,
James M. Hyman,
Sara Y. Del Valle
Abstract:
We present a method to quantify uncertainty in the predictions made by simulations of mathematical models that can be applied to a broad class of stochastic, discrete, and differential equation models. Quantifying uncertainty is crucial for determining how accurate the model predictions are and identifying which input parameters affect the outputs of interest. Most of the existing methods for unce…
▽ More
We present a method to quantify uncertainty in the predictions made by simulations of mathematical models that can be applied to a broad class of stochastic, discrete, and differential equation models. Quantifying uncertainty is crucial for determining how accurate the model predictions are and identifying which input parameters affect the outputs of interest. Most of the existing methods for uncertainty quantification require many samples to generate accurate results, are unable to differentiate where the uncertainty is coming from (e.g., parameters or model assumptions), or require a lot of computational resources. Our approach addresses these challenges and opportunities by allowing different types of uncertainty, that is, uncertainty in input parameters as well as uncertainty created through stochastic model components. This is done by combining the Karhunen-Loeve decomposition, polynomial chaos expansion, and Bayesian Gaussian process regression to create a statistical surrogate for the stochastic model. The surrogate separates the analysis of variation arising through stochastic simulation and variation arising through uncertainty in the model parameterization. We illustrate our approach by quantifying the uncertainty in a stochastic ordinary differential equation epidemic model. Specifically, we estimate four quantities of interest for the epidemic model and show agreement between the surrogate and the actual model results.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
Forecasting the 2013--2014 Influenza Season using Wikipedia
Authors:
Kyle S. Hickmann,
Geoffrey Fairchild,
Reid Priedhorsky,
Nicholas Generous,
James M. Hyman,
Alina Deshpande,
Sara Y. Del Valle
Abstract:
Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. U…
▽ More
Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for develo** prevention and mitigation strategies.
We combine modern data assimilation methods with Wikipedia access logs and CDC influenza like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013--2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy.
Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method projected the actual outcome with a high probability. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past.
△ Less
Submitted 3 November, 2014; v1 submitted 22 October, 2014;
originally announced October 2014.
-
Global disease monitoring and forecasting with Wikipedia
Authors:
Nicholas Generous,
Geoffrey Fairchild,
Alina Deshpande,
Sara Y. Del Valle,
Reid Priedhorsky
Abstract:
Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media a…
▽ More
Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with $r^2$ up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible.
Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.
△ Less
Submitted 15 July, 2014; v1 submitted 14 May, 2014;
originally announced May 2014.
-
Inferring the Origin Locations of Tweets with Quantitative Confidence
Authors:
Reid Priedhorsky,
Aron Culotta,
Sara Y. Del Valle
Abstract:
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gau…
▽ More
Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.
△ Less
Submitted 15 November, 2013; v1 submitted 16 May, 2013;
originally announced May 2013.