Search | arXiv e-print repository

arXiv:2012.07729 [pdf]

doi 10.2196/26527

"Thought I'd Share First" and Other Conspiracy Theory Tweets from the COVID-19 Infodemic: Exploratory Study

Authors: Dax Gerts, Courtney D. Shelley, Nidhi Parikh, Travis Pitts, Chrysm Watson Ross, Geoffrey Fairchild, Nidia Yadria Vaquera Chavez, Ashlynn R. Daughton

Abstract: Background: The COVID-19 outbreak has left many people isolated within their homes; these people are turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging, and monitoring its spread on social media is critical to understanding the evolution of ideas t… ▽ More Background: The COVID-19 outbreak has left many people isolated within their homes; these people are turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging, and monitoring its spread on social media is critical to understanding the evolution of ideas that have potentially negative public health impacts. Results: Analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. Random forest classifier metrics varied across the four conspiracy theories considered (F1 scores between 0.347 and 0.857); this performance increased as the given conspiracy theory was more narrowly defined. We showed that misinformation tweets demonstrate more negative sentiment when compared to nonmisinformation tweets and that theories evolve over time, incorporating details from unrelated conspiracy theories as well as real-world events. Conclusions: Although we focus here on health-related misinformation, this combination of approaches is not specific to public health and is valuable for characterizing misinformation in general, which is an important first step in creating targeted messaging to counteract its spread. Initial messaging should aim to preempt generalized misinformation before it becomes widespread, while later messaging will need to target evolving conspiracy theories and the new facets of each as they become incorporated. △ Less

Submitted 15 April, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

Report number: LA-UR-20-28305

Journal ref: JMIR Pub Hlth Surv 2021 7(4)

arXiv:1904.04931 [pdf, ps, other]

Nowcasting Influenza Incidence with CDC Web Traffic Data: A Demonstration Using a Novel Data Set

Authors: Wendy K. Caldwell, Geoffrey Fairchild, Sara Y. Del Valle

Abstract: Influenza epidemics result in a public health and economic burden around the globe. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1-2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. In this work, we present the first implementation of a novel data set b… ▽ More Influenza epidemics result in a public health and economic burden around the globe. Traditional surveillance techniques, which rely on doctor visits, provide data with a delay of 1-2 weeks. A means of obtaining real-time data and forecasting future outbreaks is desirable to provide more timely responses to influenza epidemics. In this work, we present the first implementation of a novel data set by demonstrating its ability to supplement traditional disease surveillance at multiple spatial resolutions. We use Internet traffic data from the Centers for Disease Control and Prevention (CDC) website to determine the potential usability of this data source. We test the traffic generated by ten influenza-related pages in eight states and nine census divisions within the United States and compare it against clinical surveillance data. Our results yield $r^2$ = 0.955 in the most successful case, promising results for some cases, and unsuccessful results for other cases. These results demonstrate that Internet data may be able to complement traditional influenza surveillance in some cases but not in others. Specifically, our results show that the CDC website traffic may inform national and division-level models but not models for each individual state. In addition, our results show better agreement when the data were broken up by seasons instead of aggregated over several years. In the interest of scientific transparency to further the understanding of when Internet data streams are an appropriate supplemental data source, we also include negative results (i.e., unsuccessful models). We anticipate that this work will lead to more complex nowcasting and forecasting models using this data stream. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 21 pages (including references and appendices; 12 pages prior to references), 5 figures (some include subfigures), ILI data available in csv form

Report number: LA-UR-18-31259

arXiv:1805.00445 [pdf, other]

doi 10.3389/fpubh.2018.00336

Epidemiological data challenges: planning for a more robust future through data standards

Authors: Geoffrey Fairchild, Byron Tasseff, Hari Khalsa, Nicholas Generous, Ashlynn R. Daughton, Nileena Velappan, Reid Priedhorsky, Alina Deshpande

Abstract: Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the interne… ▽ More Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities. △ Less

Submitted 24 November, 2018; v1 submitted 20 April, 2018; originally announced May 2018.

Comments: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comments

arXiv:1609.05774 [pdf]

A globally-applicable disease ontology for biosurveillance; Anthology of Biosurveillance Diseases (ABD)

Authors: A. R. Daughton, R. Priedhorsky, G. Fairchild, N. Generous, A. Hengartner, E. Abeyta, N. Velappan, A. Lillo, K. Stark, A. Deshpande

Abstract: Biosurveillance, a relatively young field, has recently increased in importance because of its relevance to national security and global health. Databases and tools describing particular subsets of disease are becoming increasingly common in the field. However, a common method to describe those diseases is lacking. Here, we present the Anthology of Biosurveillance Diseases (ABD), an ontology of in… ▽ More Biosurveillance, a relatively young field, has recently increased in importance because of its relevance to national security and global health. Databases and tools describing particular subsets of disease are becoming increasingly common in the field. However, a common method to describe those diseases is lacking. Here, we present the Anthology of Biosurveillance Diseases (ABD), an ontology of infectious diseases of biosurveillance relevance. △ Less

Submitted 25 August, 2016; originally announced September 2016.

arXiv:1504.00657 [pdf, other]

Eliciting Disease Data from Wikipedia Articles

Authors: Geoffrey Fairchild, Lalindra De Silva, Sara Y. Del Valle, Alberto M. Segre

Abstract: Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing… ▽ More Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system. △ Less

Submitted 24 August, 2015; v1 submitted 2 April, 2015; originally announced April 2015.

Comments: 9 pages, 3 figures, 4 tables, accepted to 2015 ICWSM Wikipedia workshop; v2 includes author formatting fixes and a few sentences removed to make it 8 pages (although arXiv renders it as 9); v3 uses embedded type 1 fonts in the figures and title-cases the title (required by AAAI); v4 fixes typo in abstract

Report number: LA-UR-15-22528

arXiv:1405.3612 [pdf, other]

doi 10.1371/journal.pcbi.1003892

Global disease monitoring and forecasting with Wikipedia

Authors: Nicholas Generous, Geoffrey Fairchild, Alina Deshpande, Sara Y. Del Valle, Reid Priedhorsky

Abstract: Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media a… ▽ More Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with $r^2$ up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art. △ Less

Submitted 15 July, 2014; v1 submitted 14 May, 2014; originally announced May 2014.

Comments: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarity

Report number: LA-UR 14-22535 ACM Class: H.5.3; H.3.5; J.3

Journal ref: PLOS Comput. Biol., vol. 10, no. 11, p. e1003892, Nov. 2014

Showing 1–6 of 6 results for author: Fairchild, G