-
Spotting Rumors via Novelty Detection
Authors:
Yumeng Qin,
Dominik Wurzer,
Victor Lavrenko,
Cunchen Tang
Abstract:
Rumour detection is hard because the most accurate systems operate retrospectively, only recognizing rumours once they have collected repeated signals. By then the rumours might have already spread and caused harm. We introduce a new category of features based on novelty, tailored to detect rumours early on. To compensate for the absence of repeated signals, we make use of news wire as an addition…
▽ More
Rumour detection is hard because the most accurate systems operate retrospectively, only recognizing rumours once they have collected repeated signals. By then the rumours might have already spread and caused harm. We introduce a new category of features based on novelty, tailored to detect rumours early on. To compensate for the absence of repeated signals, we make use of news wire as an additional data source. Unconfirmed (novel) information with respect to the news articles is considered as an indication of rumours. Additionally we introduce pseudo feedback, which assumes that documents that are similar to previous rumours, are more likely to also be a rumour. Comparison with other real-time approaches shows that novelty based features in conjunction with pseudo feedback perform significantly better, when detecting rumours instantly after their publication.
△ Less
Submitted 19 November, 2016;
originally announced November 2016.
-
Randomised Relevance Model
Authors:
Dominik Wurzer,
Miles Osborne,
Victor Lavrenko
Abstract:
Relevance Models are well-known retrieval models and capable of producing competitive results. However, because they use query expansion they can be very slow. We address this slowness by incorporating two variants of locality sensitive hashing (LSH) into the query expansion process. Results on two document collections suggest that we can obtain large reductions in the amount of work, with a small…
▽ More
Relevance Models are well-known retrieval models and capable of producing competitive results. However, because they use query expansion they can be very slow. We address this slowness by incorporating two variants of locality sensitive hashing (LSH) into the query expansion process. Results on two document collections suggest that we can obtain large reductions in the amount of work, with a small reduction in effectiveness. Our approach is shown to be additive when pruning query terms.
△ Less
Submitted 9 July, 2016;
originally announced July 2016.
-
I Wish I Didn't Say That! Analyzing and Predicting Deleted Messages in Twitter
Authors:
Sasa Petrovic,
Miles Osborne,
Victor Lavrenko
Abstract:
Twitter has become a major source of data for social media researchers. One important aspect of Twitter not previously considered are {\em deletions} -- removal of tweets from the stream. Deletions can be due to a multitude of reasons such as privacy concerns, rashness or attempts to undo public statements. We show how deletions can be automatically predicted ahead of time and analyse which tweets…
▽ More
Twitter has become a major source of data for social media researchers. One important aspect of Twitter not previously considered are {\em deletions} -- removal of tweets from the stream. Deletions can be due to a multitude of reasons such as privacy concerns, rashness or attempts to undo public statements. We show how deletions can be automatically predicted ahead of time and analyse which tweets are likely to be deleted and how.
△ Less
Submitted 14 May, 2013;
originally announced May 2013.