Search | arXiv e-print repository

Leveraging cross-platform data to improve automated hate speech detection

Abstract: Hate speech is increasingly prevalent online, and its negative outcomes include increased prejudice, extremism, and even offline hate crime. Automatic detection of online hate speech can help us to better understand these impacts. However, while the field has recently progressed through advances in natural language processing, challenges still remain. In particular, most existing approaches for ha… ▽ More Hate speech is increasingly prevalent online, and its negative outcomes include increased prejudice, extremism, and even offline hate crime. Automatic detection of online hate speech can help us to better understand these impacts. However, while the field has recently progressed through advances in natural language processing, challenges still remain. In particular, most existing approaches for hate speech detection focus on a single social media platform in isolation. This limits both the use of these models and their validity, as the nature of language varies from platform to platform. Here we propose a new cross-platform approach to detect hate speech which leverages multiple datasets and classification models from different platforms and trains a superlearner that can combine existing and novel training data to improve detection and increase model applicability. We demonstrate how this approach outperforms existing models, and achieves good performance when tested on messages from novel social media platforms not included in the original training data. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 34 pages, 10 figures

arXiv:2002.03419 [pdf, other]

The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Authors: Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Arman Eshaghi, Tina Toni, Marcin Salaterski, Veronika Lunina, Manon Ansart, Stanley Durrleman, Pascal Lu, Samuel Iddi, Dan Li, Wesley K. Thompson, Michael C. Donohue, Aviv Nahon, Yarden Levy, Dan Halbersberg, Mariya Cohen, Huiling Liao, Tengfei Li , et al. (71 additional authors not shown)

Abstract: We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome… ▽ More We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials. △ Less

Submitted 27 December, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

Comments: Presents final results of the TADPOLE competition. 60 pages, 7 tables, 14 figures

Journal ref: Machine Learning for Biomedical Imaging (MELBA), Dec 2021

arXiv:1808.05865 [pdf, ps, other]

doi 10.1371/journal.pone.0222212

Using path signatures to predict a diagnosis of Alzheimer's disease

Authors: P. J. Moore, J. Gallacher, T. J. Lyons

Abstract: The path signature is a means of feature generation that can encode nonlinear interactions in the data as well as the usual linear features. It can distinguish the ordering of time-sequenced changes: for example whether or not the hippocampus shrinks fast, then slowly or the converse. It provides interpretable features and its output is a fixed length vector irrespective of the number of input poi… ▽ More The path signature is a means of feature generation that can encode nonlinear interactions in the data as well as the usual linear features. It can distinguish the ordering of time-sequenced changes: for example whether or not the hippocampus shrinks fast, then slowly or the converse. It provides interpretable features and its output is a fixed length vector irrespective of the number of input points so it can encode longitudinal data of varying length and with missing data points. In this paper we demonstrate the path signature in providing features to distinguish a set of people with Alzheimer's disease from a matched set of healthy individuals. The data used are volume measurements of the whole brain, ventricles and hippocampus from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The path signature method is shown to be a useful tool for the processing of sequential data which is becoming increasingly available as monitoring technologies are applied. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: 5 pages, 3 figures. arXiv admin note: text overlap with arXiv:1808.03273

MSC Class: 62J12; 92D30

arXiv:1808.03273 [pdf, ps, other]

doi 10.1371/journal.pone.0211558

Random forest prediction of Alzheimer's disease using pairwise selection from time series data

Authors: Paul Moore, Terry Lyons, John Gallacher

Abstract: Time-dependent data collected in studies of Alzheimer's disease usually has missing and irregularly sampled data points. For this reason time series methods which assume regular sampling cannot be applied directly to the data without a pre-processing step. In this paper we use a machine learning method to learn the relationship between pairs of data points at different time separations. The input… ▽ More Time-dependent data collected in studies of Alzheimer's disease usually has missing and irregularly sampled data points. For this reason time series methods which assume regular sampling cannot be applied directly to the data without a pre-processing step. In this paper we use a machine learning method to learn the relationship between pairs of data points at different time separations. The input vector comprises a summary of the time series history and includes both demographic and non-time varying variables such as genetic data. The dataset used is from the 2017 TADPOLE grand challenge which aims to predict the onset of Alzheimer's disease using including demographic, physical and cognitive data. The challenge is a three-fold diagnosis classification into AD, MCI and control groups, the prediction of ADAS-13 score and the normalised ventricle volume. While the competition proceeds, forecasting methods may be compared using a leaderboard dataset selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and with standard metrics for measuring accuracy. For diagnosis, we find an mAUC of 0.82, and a classification accuracy of 0.73. The results show that the method is effective and comparable with other methods. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: 6 pages, 1 figure, 6 tables

MSC Class: 62M10

arXiv:1802.03572 [pdf]

Junk News on Military Affairs and National Security: Social Media Disinformation Campaigns Against US Military Personnel and Veterans

Authors: John D. Gallacher, Vlad Barash, Philip N. Howard, John Kelly

Abstract: Social media provides political news and information for both active duty military personnel and veterans. We analyze the subgroups of Twitter and Facebook users who spend time consuming junk news from websites that target US military personnel and veterans with conspiracy theories, misinformation, and other forms of junk news about military affairs and national security issues. (1) Over Twitter w… ▽ More Social media provides political news and information for both active duty military personnel and veterans. We analyze the subgroups of Twitter and Facebook users who spend time consuming junk news from websites that target US military personnel and veterans with conspiracy theories, misinformation, and other forms of junk news about military affairs and national security issues. (1) Over Twitter we find that there are significant and persistent interactions between current and former military personnel and a broad network of extremist, Russia-focused, and international conspiracy subgroups. (2) Over Facebook, we find significant and persistent interactions between public pages for military and veterans and subgroups dedicated to political conspiracy, and both sides of the political spectrum. (3) Over Facebook, the users who are most interested in conspiracy theories and the political right seem to be distributing the most junk news, whereas users who are either in the military or are veterans are among the most sophisticated news consumers, and share very little junk news through the network. △ Less

Submitted 10 February, 2018; originally announced February 2018.

Comments: Data Memo

Showing 1–5 of 5 results for author: Gallacher, J