Search | arXiv e-print repository

arXiv:1904.11886 [pdf, other]

doi 10.1162/qss_a_00030

Recommending research articles to consumers of online vaccination information

Authors: Eliza Harrison, Paige Martin, Didi Surian, Adam G. Dunn

Abstract: Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching webpages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to webpages from Altmetric. We evaluated methods for ranking the source… ▽ More Online health communications often provide biased interpretations of evidence and have unreliable links to the source research. We tested the feasibility of a tool for matching webpages to their source evidence. From 207,538 eligible vaccination-related PubMed articles, we evaluated several approaches using 3,573 unique links to webpages from Altmetric. We evaluated methods for ranking the source articles for vaccine-related research described on webpages, comparing simple baseline feature representation and dimensionality reduction approaches to those augmented with canonical correlation analysis (CCA). Performance measures included the median rank of the correct source article; the percentage of webpages for which the source article was correctly ranked first (recall@1); and the percentage ranked within the top 50 candidate articles (recall@50). While augmenting baseline methods using CCA generally improved results, no CCA-based approach outperformed a baseline method, which ranked the correct source article first for over one quarter of webpages and in the top 50 for more than half. Tools to help people identify evidence-based sources for the content they access on vaccination-related webpages are potentially feasible and may support the prevention of bias and misrepresentation of research in news and social media. △ Less

Submitted 19 August, 2020; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: 12 pages, 5 figures, 2 tables

ACM Class: H.3.3

Journal ref: Quantitative Science Studies, 1(2):810-823 (2020)

arXiv:1903.07219 [pdf]

doi 10.2196/14007

Automatically applying a credibility appraisal tool to track vaccination-related communications shared on social media

Authors: Zubair Shah, Didi Surian, Amalie Dyda, Enrico Coiera, Kenneth D. Mandl, Adam G. Dunn

Abstract: Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Our aim was to estimate the proportion of vaccination-related posts on Twitter are likely to be misinformation, and how unevenly exposure to misinformation… ▽ More Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Our aim was to estimate the proportion of vaccination-related posts on Twitter are likely to be misinformation, and how unevenly exposure to misinformation was distributed among Twitter users. Methods: Sampling from 144,878 vaccination-related web pages shared on Twitter between January 2017 and March 2018, we used a seven-point checklist adapted from two validated tools to appraise the credibility of a small subset of 474. These were used to train several classifiers (random forest, support vector machines, and a recurrent neural network with transfer learning), using the text from a web page to predict whether the information satisfies each of the seven criteria. Results: Applying the best performing classifier to the 144,878 web pages, we found that 14.4% of relevant posts to text-based communications were linked to webpages of low credibility and made up 9.2% of all potential vaccination-related exposures. However, the 100 most popular links to misinformation were potentially seen by between 2 million and 80 million Twitter users, and for a substantial sub-population of Twitter users engaging with vaccination-related information, links to misinformation appear to dominate the vaccination-related information to which they were exposed. Conclusions: We proposed a new method for automatically appraising the credibility of webpages based on a combination of validated checklist tools. The results suggest that an automatic credibility appraisal tool can be used to find populations at higher risk of exposure to misinformation or applied proactively to add friction to the sharing of low credibility vaccination information. △ Less

Submitted 18 February, 2021; v1 submitted 17 March, 2019; originally announced March 2019.

Comments: 8 Pages, 5 Figures

Journal ref: https://www.jmir.org/2019/11/e14007

arXiv:1807.08841 [pdf]

Characterizing health informatics journals by subject-level dependencies: a citation network analysis

Authors: Arezo Bodaghi, Didi Surian

Abstract: Citation network analysis has become one of methods to study how scientific knowledge flows from one domain to another. Health informatics is a multidisciplinary field that includes social science, software engineering, behavioral science, medical science and others. In this study, we perform an analysis of citation statistics from health informatics journals using data set extracted from CrossRef… ▽ More Citation network analysis has become one of methods to study how scientific knowledge flows from one domain to another. Health informatics is a multidisciplinary field that includes social science, software engineering, behavioral science, medical science and others. In this study, we perform an analysis of citation statistics from health informatics journals using data set extracted from CrossRef. For each health informatics journal, we extract the number of citations from/to studies related to computer science, medicine/clinical medicine and other fields, including the number of self-citations from the health informatics journal. With a similar number of articles used in our analysis, we show that the Journal of the American Medical Informatics Association (JAMIA) has more in-citations than the Journal of Medical Internet Research (JMIR); while JMIR has a higher number of out-citations and self-citations. We also show that JMIR cites more articles from health informatics journals and medicine related journals. In addition, the Journal of Medical Systems (JMS) cites more articles from computer science journals compared with other health informatics journals included in our analysis. △ Less

Submitted 17 August, 2018; v1 submitted 23 July, 2018; originally announced July 2018.

Comments: 16 pages, 4 figures, 4 tables

arXiv:1709.06758 [pdf, other]

doi 10.1016/j.jbi.2018.01.008

A shared latent space matrix factorisation method for recommending new trial evidence for systematic review updates

Authors: Didi Surian, Adam G. Dunn, Liat Orenstein, Rabia Bashir, Enrico Coiera, Florence T. Bourgeois

Abstract: Clinical trial registries can be used to monitor the production of trial evidence and signal when systematic reviews become out of date. However, this use has been limited to date due to the extensive manual review required to search for and screen relevant trial registrations. Our aim was to evaluate a new method that could partially automate the identification of trial registrations that may be… ▽ More Clinical trial registries can be used to monitor the production of trial evidence and signal when systematic reviews become out of date. However, this use has been limited to date due to the extensive manual review required to search for and screen relevant trial registrations. Our aim was to evaluate a new method that could partially automate the identification of trial registrations that may be relevant for systematic review updates. We identified 179 systematic reviews of drug interventions for type 2 diabetes, which included 537 clinical trials that had registrations in ClinicalTrials.gov. We tested a matrix factorisation approach that uses a shared latent space to learn how to rank relevant trial registrations for each systematic review, comparing the performance to document similarity to rank relevant trial registrations. The two approaches were tested on a holdout set of the newest trials from the set of type 2 diabetes systematic reviews and an unseen set of 141 clinical trial registrations from 17 updated systematic reviews published in the Cochrane Database of Systematic Reviews. The matrix factorisation approach outperformed the document similarity approach with a median rank of 59 and recall@100 of 60.9%, compared to a median rank of 138 and recall@100 of 42.8% in the document similarity baseline. In the second set of systematic reviews and their updates, the highest performing approach used document similarity and gave a median rank of 67 (recall@100 of 62.9%). The proposed method was useful for ranking trial registrations to reduce the manual workload associated with finding relevant trials for systematic review updates. The results suggest that the approach could be used as part of a semi-automated pipeline for monitoring potentially new evidence for inclusion in a review update. △ Less

Submitted 27 February, 2018; v1 submitted 20 September, 2017; originally announced September 2017.

Comments: Journal of Biomedical Informatics Vol. 79, March 2018, p. 32-40

ACM Class: H.3.3; J.3

Journal ref: J Biomed Inform. Vol. 79, March 2018, p. 32-40

Showing 1–4 of 4 results for author: Surian, D