Search | arXiv e-print repository

A Unified Framework for Input Feature Attribution Analysis

Authors: **gyi Sun, Pepa Atanasova, Isabelle Augenstein

Abstract: Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and fairness. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain S… ▽ More Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and fairness. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we propose a unified framework that facilitates a direct comparison between highlight and interactive explanations comprised of four diagnostic properties. Through extensive analysis across these three types of input feature explanations--each utilizing three different explanation techniques--across two datasets and two models, we reveal that each explanation type excels in terms of different diagnostic properties. In our experiments, highlight explanations are the most faithful to a model's prediction, and interactive explanations provide better utility for learning to simulate a model's predictions. These insights further highlight the need for future research to develop combined methods that enhance all diagnostic properties. △ Less

Submitted 21 June, 2024; originally announced June 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2404.18655 [pdf, other]

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Authors: Haeun Yu, Pepa Atanasova, Isabelle Augenstein

Abstract: Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what know… ▽ More Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM's parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM's parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM's parametric knowledge. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2310.13506 [pdf, other]

Explaining Interactions Between Text Spans

Authors: Sagnik Ray Choudhury, Pepa Atanasova, Isabelle Augenstein

Abstract: Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Mo… ▽ More Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Most notably, there is a lack of annotations capturing the human decision-making process w.r.t. the necessary interactions for informed decision-making in such tasks. To bridge this gap, we introduce SpanEx, a multi-annotator dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans in separate parts of the input and compare them to the human reasoning processes. Finally, we present a novel community detection based unsupervised method to extract such interaction explanations from a model's inner workings. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: code: https://github.com/copenlu/spanex , dataset: https://huggingface.co/datasets/copenlu/spanex. Accepted EMNLP 2023

ACM Class: I.2.7

arXiv:2306.02349 [pdf, other]

bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

Authors: Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

Abstract: We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequen… ▽ More We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequence labeling, document-level classification, and regression). We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark. The evaluation results show strong performance on sequence labeling tasks, but there is a lot of room for improvement for tasks that require more complex reasoning. We make bgGLUE publicly available together with the fine-tuning and the evaluation code, as well as a public leaderboard at https://bgglue.github.io/, and we hope that it will enable further advancements in develo** NLU models for Bulgarian. △ Less

Submitted 6 June, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted to ACL 2023 (Main Conference)

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: ACL 2023

arXiv:2305.18029 [pdf, other]

Faithfulness Tests for Natural Language Explanations

Authors: Pepa Atanasova, Oana-Maria Camburu, Christina Lioma, Thomas Lukasiewicz, Jakob Grue Simonsen, Isabelle Augenstein

Abstract: Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural… ▽ More Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs. △ Less

Submitted 30 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: Short paper, ACL 2023

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

arXiv:2211.04946 [pdf, other]

Accountable and Explainable Methods for Complex Reasoning over Text

Authors: Pepa Atanasova

Abstract: A major concern of Machine Learning (ML) models is their opacity. They are deployed in an increasing number of applications where they often operate as black boxes that do not provide explanations for their predictions. Among others, the potential harms associated with the lack of understanding of the models' rationales include privacy violations, adversarial manipulations, and unfair discriminati… ▽ More A major concern of Machine Learning (ML) models is their opacity. They are deployed in an increasing number of applications where they often operate as black boxes that do not provide explanations for their predictions. Among others, the potential harms associated with the lack of understanding of the models' rationales include privacy violations, adversarial manipulations, and unfair discrimination. As a result, the accountability and transparency of ML models have been posed as critical desiderata by works in policy and law, philosophy, and computer science. In computer science, the decision-making process of ML models has been studied by develo** accountability and transparency methods. Accountability methods, such as adversarial attacks and diagnostic datasets, expose vulnerabilities of ML models that could lead to malicious manipulations or systematic faults in their predictions. Transparency methods explain the rationales behind models' predictions gaining the trust of relevant stakeholders and potentially uncovering mistakes and unfairness in models' decisions. To this end, transparency methods have to meet accountability requirements as well, e.g., being robust and faithful to the underlying rationales of a model. This thesis presents my research that expands our collective knowledge in the areas of accountability and transparency of ML models developed for complex reasoning tasks over text. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: PhD Thesis

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2204.02007 [pdf, other]

Fact Checking with Insufficient Evidence

Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Abstract: Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it wit… ▽ More Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts, for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21% accuracy), whereas it is easiest for omitted date modifiers (63% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: 14 pages

MSC Class: cs.CL

arXiv:2112.06924 [pdf, other]

Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing

Authors: Shailza Jolly, Pepa Atanasova, Isabelle Augenstein

Abstract: Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of t… ▽ More Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of the most important facts from the ruling comments (RCs) of a professional journalist to obtain fact-checking explanations. However, these explanations lack fluency and sentence coherence. In this work, we present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of disconnected RCs. To regulate our editing algorithm, we use a scoring function with components including fluency and semantic preservation. In addition, we show the applicability of our approach in a completely unsupervised setting. We experiment with two benchmark datasets, LIAR-PLUS and PubHealth. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2109.15118 [pdf, other]

Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims

Authors: Tamer Elsayed, Preslav Nakov, Alberto Barrón-Cedeño, Maram Hasanain, Reem Suwaileh, Giovanni Da San Martino, Pepa Atanasova

Abstract: We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy cla… ▽ More We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy claim based on their usefulness for fact-checking that claim, (B) classify these same Web pages according to their degree of usefulness for fact-checking the target claim, (C) identify useful passages from these pages, and (D) use the useful pages to predict the claim's factuality. CheckThat! provided a full evaluation framework, consisting of data in English (derived from fact-checking sources) and Arabic (gathered and annotated from scratch) and evaluation based on mean average precision (MAP) and normalized discounted cumulative gain (nDCG) for ranking, and F1 for classification. A total of 47 teams registered to participate in this lab, and fourteen of them actually submitted runs (compared to nine last year). The evaluation results show that the most successful approaches to Task 1 used various neural networks and logistic regression. As for Task 2, learning-to-rank was used by the highest scoring runs for subtask A, while different classifiers were used in the other subtasks. We release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification. △ Less

Submitted 25 September, 2021; originally announced September 2021.

Comments: Check-worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Fake News Detection, Computational Journalism, Disinformation, Misinformation. arXiv admin note: text overlap with arXiv:2012.09263 by other authors

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: CLEF-2019

arXiv:2109.03756 [pdf, other]

Diagnostics-Guided Explanation Generation

Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Abstract: Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to… ▽ More Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model. Faithfulness is one of several so-called diagnostic properties, which prior work has identified as useful for gauging the quality of an explanation without requiring annotations. Other diagnostic properties are Data Consistency, which measures how similar explanations are for similar input instances, and Confidence Indication, which shows whether the explanation reflects the confidence of the model. In this work, we show how to directly optimise for these diagnostic properties when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks. △ Less

Submitted 8 September, 2021; originally announced September 2021.

ACM Class: I.2.7

arXiv:2009.13295 [pdf, other]

A Diagnostic Study of Explainability Techniques for Text Classification

Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Abstract: Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there… ▽ More Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there exists no definitive guide on (i) how to choose such a technique given a particular application task and model architecture, and (ii) the benefits and drawbacks of using each such technique. In this paper, we develop a comprehensive list of diagnostic properties for evaluating existing explainability techniques. We then employ the proposed list to compare a set of diverse explainability techniques on downstream text classification tasks and neural network architectures. We also compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones. Overall, we find that the gradient-based explanations perform best across tasks and model architectures, and we present further insights into the properties of the reviewed explainability techniques. △ Less

Submitted 25 September, 2020; originally announced September 2020.

MSC Class: cs.CL; cs.AI ACM Class: I.2.7

arXiv:2009.08205 [pdf, other]

Generating Label Cohesive and Well-Formed Adversarial Claims

Authors: Pepa Atanasova, Dustin Wright, Isabelle Augenstein

Abstract: Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances… ▽ More Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural language inference model. We then train a conditional language model to generate semantically valid statements, which include the found universal triggers. We find that the generated attacks maintain the directionality and semantic validity of the claim better than previous work. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 9 pages, 1 figure, 4 tables

arXiv:2009.06401 [pdf, other]

Multi-Hop Fact Checking of Political Claims

Authors: Wojciech Ostrowski, Arnav Arora, Pepa Atanasova, Isabelle Augenstein

Abstract: Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims… ▽ More Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims which can be fact-checked with simple reasoning and is constructed artificially. Here, we study more complex claim verification of naturally occurring claims with multiple hops over interconnected evidence chunks. We: 1) construct a small annotated dataset, PolitiHop, of evidence sentences for claim verification; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex and achieve the best performance with an architecture that specifically models reasoning over evidence pieces in combination with in-domain transfer learning. △ Less

Submitted 1 June, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

Comments: 10 pages, to be published at Proceedings of IJCAI-2021

MSC Class: 68T07; 68T50 ACM Class: I.2.7

arXiv:2006.07235 [pdf, ps, other]

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Authors: Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin

Abstract: We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, En… ▽ More We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers. △ Less

Submitted 30 September, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Proceedings of the International Workshop on Semantic Evaluation (SemEval-2020)

MSC Class: 68T50; 68T07 ACM Class: I.2.7

arXiv:2004.14454 [pdf, other]

SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

Authors: Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov

Abstract: The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited… ▽ More The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited in size and it might be biased towards offensive language as it was collected using keywords. In this work, we present SOLID, an expanded dataset, where the tweets were collected in a more principled manner. SOLID contains over nine million English tweets labeled in a semi-supervised fashion. We demonstrate that using SOLID along with OLID yields sizable performance gains on the OLID test set for two different models, especially for the lower levels of the taxonomy. △ Less

Submitted 24 September, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: offensive language, hate speech, cyberbullying, cyber-aggression, taxonomy for offensive language identification

MSC Class: 68T50; 68T07 ACM Class: F.2.2; I.2.7

Journal ref: ACL-2021 (Findings)

arXiv:2004.05773 [pdf, other]

Generating Fact Checking Explanations

Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

Abstract: Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims.… ▽ More Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims. This paper provides the first study of how these explanations can be generated automatically based on available claim context, and how this task can be modelled jointly with veracity prediction. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system. The results of a manual evaluation further suggest that the informativeness, coverage and overall quality of the generated explanations are also improved in the multi-task model. △ Less

Submitted 13 April, 2020; originally announced April 2020.

Comments: In Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)

arXiv:1911.08782 [pdf, other]

Joint Emotion Label Space Modelling for Affect Lexica

Authors: Luna De Bruyne, Pepa Atanasova, Isabelle Augenstein

Abstract: Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, vocabulary coverage issues, differences in construction method and discrepancies in emotion framework and representation result in a heterogeneous landscape of emotion detection resources, calling for a unified approach to utilising them. To combat this, we present an extended emotion lexicon… ▽ More Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, vocabulary coverage issues, differences in construction method and discrepancies in emotion framework and representation result in a heterogeneous landscape of emotion detection resources, calling for a unified approach to utilising them. To combat this, we present an extended emotion lexicon of 30,273 unique entries, which is a result of merging eight existing emotion lexica by means of a multi-view variational autoencoder (VAE). We showed that a VAE is a valid approach for combining lexica with different label spaces into a joint emotion label space with a chosen number of dimensions, and that these dimensions are still interpretable. We tested the utility of the unified VAE lexicon by employing the lexicon values as features in an emotion detection model. We found that the VAE lexicon outperformed individual lexica, but contrary to our expectations, it did not outperform a naive concatenation of lexica, although it did contribute to the naive concatenation when added as an extra lexicon. Furthermore, using lexicon information as additional features on top of state-of-the-art language models usually resulted in a better performance than when no lexicon information was used. △ Less

Submitted 18 June, 2021; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: Computer Speech and Language journal, to appear

arXiv:1908.07912 [pdf, other]

It Takes Nine to Smell a Rat: Neural Multi-Task Learning for Check-Worthiness Prediction

Authors: Slavena Vasileva, Pepa Atanasova, Lluís Màrquez, Alberto Barrón-Cedeño, Preslav Nakov

Abstract: We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the s… ▽ More We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the same debate, we show that it pays to learn from multiple sources simultaneously (PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and Washington Post) in a multi-task learning setup, even when a particular source is chosen as a target to imitate. Our evaluation shows state-of-the-art results on a standard dataset for the task of check-worthiness prediction. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: Check-worthiness; Fact-Checking; Veracity; Multi-task Learning; Neural Networks. arXiv admin note: text overlap with arXiv:1908.01328

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: RANLP-2019

arXiv:1908.01328 [pdf, other]

doi 10.1145/3297722

Automatic Fact-Checking Using Context and Discourse Information

Authors: Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass

Abstract: We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms o… ▽ More We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms of discourse cues and contextual features. For the check-worthiness estimation task, we focus on political debates, and we model the target claim in the context of the full intervention of a participant and the previous and the following turns in the debate, taking into account contextual meta information. For the fact-checking task, we focus on answer verification in a community forum, and we model the veracity of the answer with respect to the entire question--answer thread in which it occurs as well as with respect to other related posts from the entire forum. We develop annotated datasets for both tasks and we run extensive experimental evaluation, confirming that both types of information ---but especially contextual features--- play an important role. △ Less

Submitted 4 August, 2019; originally announced August 2019.

Comments: JDIQ,Special Issue on Combating Digital Misinformation and Disinformation

Journal ref: J. Data and Information Quality, Volume 11 Issue 3, July 2019, Article No. 12

arXiv:1906.01727 [pdf, ps, other]

SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums

Authors: Tsvetomila Mihaylova, Georgi Karadjov, Pepa Atanasova, Ramy Baly, Mitra Mohtarami, Preslav Nakov

Abstract: We present SemEval-2019 Task 8 on Fact Checking in Community Question Answering Forums, which features two subtasks. Subtask A is about deciding whether a question asks for factual information vs. an opinion/advice vs. just socializing. Subtask B asks to predict whether an answer to a factual question is true, false or not a proper answer. We received 17 official submissions for subtask A and 11 o… ▽ More We present SemEval-2019 Task 8 on Fact Checking in Community Question Answering Forums, which features two subtasks. Subtask A is about deciding whether a question asks for factual information vs. an opinion/advice vs. just socializing. Subtask B asks to predict whether an answer to a factual question is true, false or not a proper answer. We received 17 official submissions for subtask A and 11 official submissions for Subtask B. For subtask A, all systems improved over the majority class baseline. For Subtask B, all systems were below a majority class baseline, but several systems were very close to it. The leaderboard and the data from the competition can be found at http://competitions.codalab.org/competitions/20022 △ Less

Submitted 25 May, 2019; originally announced June 2019.

Comments: Fact checking, community question answering, community fora, semeval-2019

arXiv:1905.10565 [pdf, ps, other]

doi 10.1145/3331184.3331308

Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search

Authors: Pepa Atanasova, Georgi Karadzhov, Yasen Kiprov, Preslav Nakov, Fabrizio Sebastiani

Abstract: In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, al… ▽ More In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily. △ Less

Submitted 25 May, 2019; originally announced May 2019.

Comments: 4 pages, in Proceeding of SIGIR 2019

Journal ref: Final version published in Proceedings of the 42nd ACM Conference on Research and Development in Information Retrieval (SIGIR 2016), Paris, FR, 2019, pp. 997-1000

arXiv:1905.03895 [pdf, ps, other]

Periodicity of magnetization reversals in $\varphi_0$ Josephson junction

Authors: P. Kh. Atanasova, S. A. Panayotova, I. R. Rahmonov, Yu. M. Shukrinov, E. V. Zemlyanaya, M. V. Bashashin

Abstract: The magnetization reversal in ${\varphi_0}$-Josephson junction with direct coupling between magnetic moment and Josephson current has been studied. By adding pulse signal, the dynamics of magnetic moment components have been simulated and the full magnetization reversal at different parameters of the junction has been demonstrated. We obtain a detailed pictures representing the intervals of the da… ▽ More The magnetization reversal in ${\varphi_0}$-Josephson junction with direct coupling between magnetic moment and Josephson current has been studied. By adding pulse signal, the dynamics of magnetic moment components have been simulated and the full magnetization reversal at different parameters of the junction has been demonstrated. We obtain a detailed pictures representing the intervals of the dam** parameter $α$, Josephson to magnetic energy relation $G$ and spin-orbit coupling parameter $r$ with full magnetization reversal. A periodicity in the appearance of magnetization reversal intervals with increase in Josephson to magnetic energy relation is found. The obtained results might be used in different fields of superconducting spintronics. △ Less

Submitted 9 May, 2019; originally announced May 2019.

arXiv:1808.05542 [pdf, other]

Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness

Authors: Pepa Atanasova, Alberto Barron-Cedeno, Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, Preslav Nakov

Abstract: We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for… ▽ More We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for fact checking. We offered the task in both English and Arabic, based on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign. A total of 30 teams registered to participate in the Lab and seven teams actually submitted systems for Task~1. The most successful approaches used by the participants relied on recurrent and multi-layer neural networks, as well as on combinations of distributional representations, on matchings claims' vocabulary against lexicons, and on measures of syntactic dependency. The best systems achieved mean average precision of 0.18 and 0.15 on the English and on the Arabic test datasets, respectively. This leaves large room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in check-worthiness estimation. △ Less

Submitted 8 August, 2018; originally announced August 2018.

Comments: Computational journalism, Check-worthiness, Fact-checking, Veracity

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: CLEF-2018

arXiv:1708.09198 [pdf, ps, other]

Simulation of Collective Excitations in the Stack of Long Josephson Junctions

Authors: Ilhom Rahmonov, Yuri Shukrinov, Pavlina Atanasova, Elena Zemlyanaya, Oksana Streltsova, Maxim Zuev, Andrej Plecenik, Akinobu Irie

Abstract: The phase dynamics of the stack of long Josephson junctions has been studied. Both inductive and capacitive couplings between Josephson junctions have been taken into account in the calculations. The IV--curve, bias current dependence of radiation power and dynamics of each JJs of the stack have been investigated. The coexistence of the charge traveling wave and fluxon states has been observed. Th… ▽ More The phase dynamics of the stack of long Josephson junctions has been studied. Both inductive and capacitive couplings between Josephson junctions have been taken into account in the calculations. The IV--curve, bias current dependence of radiation power and dynamics of each JJs of the stack have been investigated. The coexistence of the charge traveling wave and fluxon states has been observed. This state can be considered as a new collective excitation in the system of coupled Josephson junctions. We demonstrate that the observed collective excitation leads to the decreasing of radiation power from the system. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1107.2999 [pdf, ps, other]

Numerical study of fluxon solutions of sine-Gordon equation under the influence of the boundary conditions

Authors: P. Kh. Atanasova, E. V. Zemlyanaya, Yu. M. Shukrinov

Abstract: The fluxon solutions of a boundary problem for the sine-Gordon equation (SGE) are investigated numerically in dependence on the boundary conditions. Interconnection between fluxon and constant solutions is analyzed. Numerical results are discussed in context of the long Josephson junction model. The fluxon solutions of a boundary problem for the sine-Gordon equation (SGE) are investigated numerically in dependence on the boundary conditions. Interconnection between fluxon and constant solutions is analyzed. Numerical results are discussed in context of the long Josephson junction model. △ Less

Submitted 15 July, 2011; originally announced July 2011.

Comments: 6 pages, 7 figures, Int. Conference MMCP-2011, July, 2011

arXiv:1007.4778 [pdf, ps, other]

doi 10.1088/1742-6596/248/1/012044

Influence of Josephson current second harmonic on stability of magnetic flux in long junctions

Authors: P. Kh. Atanasova, T. L. Boyadjiev, Yu. M. Shukrinov, E. V. Zemlyanaya, P. Seidel

Abstract: We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The dependence of the static magnetic flux distributions on parameters of the model are investigated numerically. Stability of the static solutions is checked by the sign of the smallest eigenvalue of the associated Sturm-Liouville problem. New solutions whic… ▽ More We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The dependence of the static magnetic flux distributions on parameters of the model are investigated numerically. Stability of the static solutions is checked by the sign of the smallest eigenvalue of the associated Sturm-Liouville problem. New solutions which do not exist in the traditional model, have been found. Investigation of the influence of second harmonic on the stability of magnetic flux distributions for main solutions is performed. △ Less

Submitted 27 July, 2010; originally announced July 2010.

Comments: 4 pages, 6 figures, to be published in Proc. of Dubna-Nano2010, July 5-10, 2010, Russia

arXiv:1005.5691 [pdf, ps, other]

Numerical investigation of the second harmonic effects in the LJJ

Authors: P. Kh. Atanasova, T. L. Boyadjiev, Yu. M. Shukrinov, E. V. Zemlyanaya

Abstract: We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The sign of second harmonic is important for many physical applications. The influence of the sign and value of the second harmonic on the magnetic flux distributions is investigated. At each step of numerical continuation in parameters of the model, the corr… ▽ More We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The sign of second harmonic is important for many physical applications. The influence of the sign and value of the second harmonic on the magnetic flux distributions is investigated. At each step of numerical continuation in parameters of the model, the corresponding nonlinear boundary problem is solved on the basis of the continuous analog of Newton's method with the 4th order Numerov discretization scheme. New solutions which do not exist in the traditional model have been found. The influence of the second harmonic on stability of magnetic flux distributions for main solutions is investigated. △ Less

Submitted 31 May, 2010; originally announced May 2010.

Comments: 7 pages, 4 figures, to be published in Proc. of FDM10, June 28 - July 2, 2010, Lozenetz, Bulgaria

arXiv:1005.4796 [pdf, ps, other]

Numerical study of magnetic flux in the LJJ model with double sine-Gordon equation

Authors: P. Kh. Atanasova, T. L. Boyadjiev, Yu. M. Shukrinov, E. V. Zemlyanaya

Abstract: The decrease of the barrier transparency in superconductor-insulator-superconductor (SIS) Josephson junctions leads to the deviations of the current-phase relation from the sinusoidal form. The sign of second harmonics is important for many applications, in particular in junctions with a more complex structure like SNINS or SFIFS, where N is a normal metal and F is a weak metallic ferromagnet. In… ▽ More The decrease of the barrier transparency in superconductor-insulator-superconductor (SIS) Josephson junctions leads to the deviations of the current-phase relation from the sinusoidal form. The sign of second harmonics is important for many applications, in particular in junctions with a more complex structure like SNINS or SFIFS, where N is a normal metal and F is a weak metallic ferromagnet. In our work we study the static magnetic flux distributions in long Josephson junctions taking into account the higher harmonics in the Fourier-decomposition of the Josephson current. Stability analysis is based on numerical solution of a spectral Sturm-Liouville problem formulated for each distribution. In this approach the nullification of the minimal eigenvalue of this problem indicates a bifurcation point in one of parameters. At each step of numerical continuation in parameters of the model, the corresponding nonlinear boundary problem is solved on the basis of the continuous analog of Newton's method. The solutions which do not exist in the traditional model have been found. The influence of second harmonic on stability of magnetic flux distributions for main solutions is investigated. △ Less

Submitted 26 May, 2010; originally announced May 2010.

Showing 1–28 of 28 results for author: Atanasova, P