-
Automatically applying a credibility appraisal tool to track vaccination-related communications shared on social media
Authors:
Zubair Shah,
Didi Surian,
Amalie Dyda,
Enrico Coiera,
Kenneth D. Mandl,
Adam G. Dunn
Abstract:
Background:
Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Our aim was to estimate the proportion of vaccination-related posts on Twitter are likely to be misinformation, and how unevenly exposure to misinformation…
▽ More
Background:
Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Our aim was to estimate the proportion of vaccination-related posts on Twitter are likely to be misinformation, and how unevenly exposure to misinformation was distributed among Twitter users.
Methods:
Sampling from 144,878 vaccination-related web pages shared on Twitter between January 2017 and March 2018, we used a seven-point checklist adapted from two validated tools to appraise the credibility of a small subset of 474. These were used to train several classifiers (random forest, support vector machines, and a recurrent neural network with transfer learning), using the text from a web page to predict whether the information satisfies each of the seven criteria.
Results:
Applying the best performing classifier to the 144,878 web pages, we found that 14.4% of relevant posts to text-based communications were linked to webpages of low credibility and made up 9.2% of all potential vaccination-related exposures. However, the 100 most popular links to misinformation were potentially seen by between 2 million and 80 million Twitter users, and for a substantial sub-population of Twitter users engaging with vaccination-related information, links to misinformation appear to dominate the vaccination-related information to which they were exposed.
Conclusions:
We proposed a new method for automatically appraising the credibility of webpages based on a combination of validated checklist tools. The results suggest that an automatic credibility appraisal tool can be used to find populations at higher risk of exposure to misinformation or applied proactively to add friction to the sharing of low credibility vaccination information.
△ Less
Submitted 18 February, 2021; v1 submitted 17 March, 2019;
originally announced March 2019.
-
FADL:Federated-Autonomous Deep Learning for Distributed Electronic Health Record
Authors:
Dianbo Liu,
Timothy Miller,
Raheel Sayeed,
Kenneth D. Mandl
Abstract:
Electronic health record (EHR) data is collected by individual institutions and often stored across locations in silos. Getting access to these data is difficult and slow due to security, privacy, regulatory, and operational issues. We show, using ICU data from 58 different hospitals, that machine learning models to predict patient mortality can be trained efficiently without moving health data ou…
▽ More
Electronic health record (EHR) data is collected by individual institutions and often stored across locations in silos. Getting access to these data is difficult and slow due to security, privacy, regulatory, and operational issues. We show, using ICU data from 58 different hospitals, that machine learning models to predict patient mortality can be trained efficiently without moving health data out of their silos using a distributed machine learning strategy. We propose a new method, called Federated-Autonomous Deep Learning (FADL) that trains part of the model using all data sources in a distributed manner and other parts using data from specific data sources. We observed that FADL outperforms traditional federated learning strategy and conclude that balance between global and local training is an important factor to consider when design distributed machine learning methods , especially in healthcare.
△ Less
Submitted 2 December, 2018; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Modeling Spatiotemporal Factors Associated With Sentiment on Twitter: Synthesis and Suggestions for Improving the Identification of Localized Deviations
Authors:
Zubair Shah,
Paige Martin,
Enrico Coiera,
Kenneth D. Mandl,
Adam G. Dunn
Abstract:
Background: Studies examining how sentiment on social media varies depending on timing and location appear to produce inconsistent results, making it hard to design systems that use sentiment to detect localized events for public health applications.
Objective: The aim of this study was to measure how common timing and location confounders explain variation in sentiment on Twitter.
Methods: Us…
▽ More
Background: Studies examining how sentiment on social media varies depending on timing and location appear to produce inconsistent results, making it hard to design systems that use sentiment to detect localized events for public health applications.
Objective: The aim of this study was to measure how common timing and location confounders explain variation in sentiment on Twitter.
Methods: Using a dataset of 16.54 million English-language tweets from 100 cities posted between July 13 and November 30, 2017, we estimated the positive and negative sentiment for each of the cities using a dictionary-based sentiment analysis and constructed models to explain the differences in sentiment using time of day, day of week, weather, city, and interaction type (conversations or broadcasting) as factors and found that all factors were independently associated with sentiment.
Results: In the full multivariable model of positive (Pearson r in test data 0.236; 95\% CI 0.231-0.241) and negative (Pearson r in test data 0.306; 95\% CI 0.301-0.310) sentiment, the city and time of day explained more of the variance than weather and day of week. Models that account for these confounders produce a different distribution and ranking of important events compared with models that do not account for these confounders.
Conclusions: In public health applications that aim to detect localized events by aggregating sentiment across populations of Twitter users, it is worthwhile accounting for baseline differences before looking for unexpected changes.
△ Less
Submitted 14 May, 2019; v1 submitted 21 February, 2018;
originally announced February 2018.