ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Ghosal, Soumya Suvra; P, Deepak; Jurek-Loughrey, Anna

Computer Science > Computation and Language

arXiv:2010.10836 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 21 Oct 2020]

Title:ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Authors:Soumya Suvra Ghosal, Deepak P, Anna Jurek-Loughrey

View PDF

Abstract:Disinformation is often presented in long textual articles, especially when it relates to domains such as health, often seen in relation to COVID-19. These articles are typically observed to have a number of trustworthy sentences among which core disinformation sentences are scattered. In this paper, we propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy. We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task. Sentences represented using those features are then clustered, following which the key sentences are identified through proximity scoring. We also curate a new dataset with sentence level disinformation scorings to aid evaluation for this task; the dataset is being made publicly available to facilitate further research. Based on a comprehensive empirical evaluation against techniques from related tasks such as claim detection and summarization, as well as against simplified variants of our proposed approach, we illustrate that our method is able to identify core disinformation effectively.

Comments:	The 22nd International Conference on Information Integration and Web-based Applications & Services (iiWAS '20), Chiang Mai, Thailand
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2010.10836 [cs.CL]
	(or arXiv:2010.10836v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.10836

Submission history

From: Deepak P [view email]
[v1] Wed, 21 Oct 2020 08:53:36 UTC (61 KB)

Computer Science > Computation and Language

Title:ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators