-
A Unified Framework for Input Feature Attribution Analysis
Authors:
**gyi Sun,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and fairness. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain S…
▽ More
Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and fairness. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we propose a unified framework that facilitates a direct comparison between highlight and interactive explanations comprised of four diagnostic properties. Through extensive analysis across these three types of input feature explanations--each utilizing three different explanation techniques--across two datasets and two models, we reveal that each explanation type excels in terms of different diagnostic properties. In our experiments, highlight explanations are the most faithful to a model's prediction, and interactive explanations provide better utility for learning to simulate a model's predictions. These insights further highlight the need for future research to develop combined methods that enhance all diagnostic properties.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Authors:
Haeun Yu,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what know…
▽ More
Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components. Instance Attribution (IA) and Neuron Attribution (NA) offer insights into this training-acquired knowledge, though they have not been compared systematically. Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM's parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM's parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM's parametric knowledge.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Explaining Interactions Between Text Spans
Authors:
Sagnik Ray Choudhury,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Mo…
▽ More
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Most notably, there is a lack of annotations capturing the human decision-making process w.r.t. the necessary interactions for informed decision-making in such tasks. To bridge this gap, we introduce SpanEx, a multi-annotator dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans in separate parts of the input and compare them to the human reasoning processes. Finally, we present a novel community detection based unsupervised method to extract such interaction explanations from a model's inner workings.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
Authors:
Momchil Hardalov,
Pepa Atanasova,
Todor Mihaylov,
Galia Angelova,
Kiril Simov,
Petya Osenova,
Ves Stoyanov,
Ivan Koychev,
Preslav Nakov,
Dragomir Radev
Abstract:
We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequen…
▽ More
We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequence labeling, document-level classification, and regression). We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark. The evaluation results show strong performance on sequence labeling tasks, but there is a lot of room for improvement for tasks that require more complex reasoning. We make bgGLUE publicly available together with the fine-tuning and the evaluation code, as well as a public leaderboard at https://bgglue.github.io/, and we hope that it will enable further advancements in develo** NLU models for Bulgarian.
△ Less
Submitted 6 June, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Faithfulness Tests for Natural Language Explanations
Authors:
Pepa Atanasova,
Oana-Maria Camburu,
Christina Lioma,
Thomas Lukasiewicz,
Jakob Grue Simonsen,
Isabelle Augenstein
Abstract:
Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural…
▽ More
Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.
△ Less
Submitted 30 June, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Accountable and Explainable Methods for Complex Reasoning over Text
Authors:
Pepa Atanasova
Abstract:
A major concern of Machine Learning (ML) models is their opacity. They are deployed in an increasing number of applications where they often operate as black boxes that do not provide explanations for their predictions. Among others, the potential harms associated with the lack of understanding of the models' rationales include privacy violations, adversarial manipulations, and unfair discriminati…
▽ More
A major concern of Machine Learning (ML) models is their opacity. They are deployed in an increasing number of applications where they often operate as black boxes that do not provide explanations for their predictions. Among others, the potential harms associated with the lack of understanding of the models' rationales include privacy violations, adversarial manipulations, and unfair discrimination. As a result, the accountability and transparency of ML models have been posed as critical desiderata by works in policy and law, philosophy, and computer science.
In computer science, the decision-making process of ML models has been studied by develo** accountability and transparency methods. Accountability methods, such as adversarial attacks and diagnostic datasets, expose vulnerabilities of ML models that could lead to malicious manipulations or systematic faults in their predictions. Transparency methods explain the rationales behind models' predictions gaining the trust of relevant stakeholders and potentially uncovering mistakes and unfairness in models' decisions. To this end, transparency methods have to meet accountability requirements as well, e.g., being robust and faithful to the underlying rationales of a model.
This thesis presents my research that expands our collective knowledge in the areas of accountability and transparency of ML models developed for complex reasoning tasks over text.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Fact Checking with Insufficient Evidence
Authors:
Pepa Atanasova,
Jakob Grue Simonsen,
Christina Lioma,
Isabelle Augenstein
Abstract:
Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it wit…
▽ More
Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts, for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21% accuracy), whereas it is easiest for omitted date modifiers (63% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing
Authors:
Shailza Jolly,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of t…
▽ More
Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when human-readable explanations accompany the veracity labels. However, manual collection of such explanations is expensive and time-consuming. Recent works frame explanation generation as extractive summarization, and propose to automatically select a sufficient subset of the most important facts from the ruling comments (RCs) of a professional journalist to obtain fact-checking explanations. However, these explanations lack fluency and sentence coherence. In this work, we present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of disconnected RCs. To regulate our editing algorithm, we use a scoring function with components including fluency and semantic preservation. In addition, we show the applicability of our approach in a completely unsupervised setting. We experiment with two benchmark datasets, LIAR-PLUS and PubHealth. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims
Authors:
Tamer Elsayed,
Preslav Nakov,
Alberto Barrón-Cedeño,
Maram Hasanain,
Reem Suwaileh,
Giovanni Da San Martino,
Pepa Atanasova
Abstract:
We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy cla…
▽ More
We present an overview of the second edition of the CheckThat! Lab at CLEF 2019. The lab featured two tasks in two different languages: English and Arabic. Task 1 (English) challenged the participating systems to predict which claims in a political debate or speech should be prioritized for fact-checking. Task 2 (Arabic) asked to (A) rank a given set of Web pages with respect to a check-worthy claim based on their usefulness for fact-checking that claim, (B) classify these same Web pages according to their degree of usefulness for fact-checking the target claim, (C) identify useful passages from these pages, and (D) use the useful pages to predict the claim's factuality. CheckThat! provided a full evaluation framework, consisting of data in English (derived from fact-checking sources) and Arabic (gathered and annotated from scratch) and evaluation based on mean average precision (MAP) and normalized discounted cumulative gain (nDCG) for ranking, and F1 for classification. A total of 47 teams registered to participate in this lab, and fourteen of them actually submitted runs (compared to nine last year). The evaluation results show that the most successful approaches to Task 1 used various neural networks and logistic regression. As for Task 2, learning-to-rank was used by the highest scoring runs for subtask A, while different classifiers were used in the other subtasks. We release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.
△ Less
Submitted 25 September, 2021;
originally announced September 2021.
-
Diagnostics-Guided Explanation Generation
Authors:
Pepa Atanasova,
Jakob Grue Simonsen,
Christina Lioma,
Isabelle Augenstein
Abstract:
Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to…
▽ More
Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model. Faithfulness is one of several so-called diagnostic properties, which prior work has identified as useful for gauging the quality of an explanation without requiring annotations. Other diagnostic properties are Data Consistency, which measures how similar explanations are for similar input instances, and Confidence Indication, which shows whether the explanation reflects the confidence of the model. In this work, we show how to directly optimise for these diagnostic properties when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
A Diagnostic Study of Explainability Techniques for Text Classification
Authors:
Pepa Atanasova,
Jakob Grue Simonsen,
Christina Lioma,
Isabelle Augenstein
Abstract:
Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there…
▽ More
Recent developments in machine learning have introduced models that approach human performance at the cost of increased architectural complexity. Efforts to make the rationales behind the models' predictions transparent have inspired an abundance of new explainability techniques. Provided with an already trained model, they compute saliency scores for the words of an input instance. However, there exists no definitive guide on (i) how to choose such a technique given a particular application task and model architecture, and (ii) the benefits and drawbacks of using each such technique. In this paper, we develop a comprehensive list of diagnostic properties for evaluating existing explainability techniques. We then employ the proposed list to compare a set of diverse explainability techniques on downstream text classification tasks and neural network architectures. We also compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones. Overall, we find that the gradient-based explanations perform best across tasks and model architectures, and we present further insights into the properties of the reviewed explainability techniques.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Generating Label Cohesive and Well-Formed Adversarial Claims
Authors:
Pepa Atanasova,
Dustin Wright,
Isabelle Augenstein
Abstract:
Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances…
▽ More
Adversarial attacks reveal important vulnerabilities and flaws of trained models. One potent type of attack are universal adversarial triggers, which are individual n-grams that, when appended to instances of a class under attack, can trick a model into predicting a target class. However, for inference tasks such as fact checking, these triggers often inadvertently invert the meaning of instances they are inserted in. In addition, such attacks produce semantically nonsensical inputs, as they simply concatenate triggers to existing samples. Here, we investigate how to generate adversarial attacks against fact checking systems that preserve the ground truth meaning and are semantically valid. We extend the HotFlip attack algorithm used for universal trigger generation by jointly minimising the target class loss of a fact checking model and the entailment class loss of an auxiliary natural language inference model. We then train a conditional language model to generate semantically valid statements, which include the found universal triggers. We find that the generated attacks maintain the directionality and semantic validity of the claim better than previous work.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Multi-Hop Fact Checking of Political Claims
Authors:
Wojciech Ostrowski,
Arnav Arora,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims…
▽ More
Recent work has proposed multi-hop models and datasets for studying complex natural language reasoning. One notable task requiring multi-hop reasoning is fact checking, where a set of connected evidence pieces leads to the final verdict of a claim. However, existing datasets either do not provide annotations for gold evidence pages, or the only dataset which does (FEVER) mostly consists of claims which can be fact-checked with simple reasoning and is constructed artificially. Here, we study more complex claim verification of naturally occurring claims with multiple hops over interconnected evidence chunks. We: 1) construct a small annotated dataset, PolitiHop, of evidence sentences for claim verification; 2) compare it to existing multi-hop datasets; and 3) study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex and achieve the best performance with an architecture that specifically models reasoning over evidence pieces in combination with in-domain transfer learning.
△ Less
Submitted 1 June, 2021; v1 submitted 10 September, 2020;
originally announced September 2020.
-
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Authors:
Marcos Zampieri,
Preslav Nakov,
Sara Rosenthal,
Pepa Atanasova,
Georgi Karadzhov,
Hamdy Mubarak,
Leon Derczynski,
Zeses Pitenis,
Çağrı Çöltekin
Abstract:
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, En…
▽ More
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.
△ Less
Submitted 30 September, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
Authors:
Sara Rosenthal,
Pepa Atanasova,
Georgi Karadzhov,
Marcos Zampieri,
Preslav Nakov
Abstract:
The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited…
▽ More
The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression. Recent work presented the OLID dataset, which follows a taxonomy for offensive language identification that provides meaningful information for understanding the type and the target of offensive messages. However, it is limited in size and it might be biased towards offensive language as it was collected using keywords. In this work, we present SOLID, an expanded dataset, where the tweets were collected in a more principled manner. SOLID contains over nine million English tweets labeled in a semi-supervised fashion. We demonstrate that using SOLID along with OLID yields sizable performance gains on the OLID test set for two different models, especially for the lower levels of the taxonomy.
△ Less
Submitted 24 September, 2021; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Generating Fact Checking Explanations
Authors:
Pepa Atanasova,
Jakob Grue Simonsen,
Christina Lioma,
Isabelle Augenstein
Abstract:
Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims.…
▽ More
Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims. This paper provides the first study of how these explanations can be generated automatically based on available claim context, and how this task can be modelled jointly with veracity prediction. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system. The results of a manual evaluation further suggest that the informativeness, coverage and overall quality of the generated explanations are also improved in the multi-task model.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Joint Emotion Label Space Modelling for Affect Lexica
Authors:
Luna De Bruyne,
Pepa Atanasova,
Isabelle Augenstein
Abstract:
Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, vocabulary coverage issues, differences in construction method and discrepancies in emotion framework and representation result in a heterogeneous landscape of emotion detection resources, calling for a unified approach to utilising them. To combat this, we present an extended emotion lexicon…
▽ More
Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, vocabulary coverage issues, differences in construction method and discrepancies in emotion framework and representation result in a heterogeneous landscape of emotion detection resources, calling for a unified approach to utilising them. To combat this, we present an extended emotion lexicon of 30,273 unique entries, which is a result of merging eight existing emotion lexica by means of a multi-view variational autoencoder (VAE). We showed that a VAE is a valid approach for combining lexica with different label spaces into a joint emotion label space with a chosen number of dimensions, and that these dimensions are still interpretable. We tested the utility of the unified VAE lexicon by employing the lexicon values as features in an emotion detection model. We found that the VAE lexicon outperformed individual lexica, but contrary to our expectations, it did not outperform a naive concatenation of lexica, although it did contribute to the naive concatenation when added as an extra lexicon. Furthermore, using lexicon information as additional features on top of state-of-the-art language models usually resulted in a better performance than when no lexicon information was used.
△ Less
Submitted 18 June, 2021; v1 submitted 20 November, 2019;
originally announced November 2019.
-
It Takes Nine to Smell a Rat: Neural Multi-Task Learning for Check-Worthiness Prediction
Authors:
Slavena Vasileva,
Pepa Atanasova,
Lluís Màrquez,
Alberto Barrón-Cedeño,
Preslav Nakov
Abstract:
We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the s…
▽ More
We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the same debate, we show that it pays to learn from multiple sources simultaneously (PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and Washington Post) in a multi-task learning setup, even when a particular source is chosen as a target to imitate. Our evaluation shows state-of-the-art results on a standard dataset for the task of check-worthiness prediction.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Automatic Fact-Checking Using Context and Discourse Information
Authors:
Pepa Atanasova,
Preslav Nakov,
Lluís Màrquez,
Alberto Barrón-Cedeño,
Georgi Karadzhov,
Tsvetomila Mihaylova,
Mitra Mohtarami,
James Glass
Abstract:
We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms o…
▽ More
We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms of discourse cues and contextual features. For the check-worthiness estimation task, we focus on political debates, and we model the target claim in the context of the full intervention of a participant and the previous and the following turns in the debate, taking into account contextual meta information. For the fact-checking task, we focus on answer verification in a community forum, and we model the veracity of the answer with respect to the entire question--answer thread in which it occurs as well as with respect to other related posts from the entire forum. We develop annotated datasets for both tasks and we run extensive experimental evaluation, confirming that both types of information ---but especially contextual features--- play an important role.
△ Less
Submitted 4 August, 2019;
originally announced August 2019.
-
SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums
Authors:
Tsvetomila Mihaylova,
Georgi Karadjov,
Pepa Atanasova,
Ramy Baly,
Mitra Mohtarami,
Preslav Nakov
Abstract:
We present SemEval-2019 Task 8 on Fact Checking in Community Question Answering Forums, which features two subtasks. Subtask A is about deciding whether a question asks for factual information vs. an opinion/advice vs. just socializing. Subtask B asks to predict whether an answer to a factual question is true, false or not a proper answer. We received 17 official submissions for subtask A and 11 o…
▽ More
We present SemEval-2019 Task 8 on Fact Checking in Community Question Answering Forums, which features two subtasks. Subtask A is about deciding whether a question asks for factual information vs. an opinion/advice vs. just socializing. Subtask B asks to predict whether an answer to a factual question is true, false or not a proper answer. We received 17 official submissions for subtask A and 11 official submissions for Subtask B. For subtask A, all systems improved over the majority class baseline. For Subtask B, all systems were below a majority class baseline, but several systems were very close to it. The leaderboard and the data from the competition can be found at http://competitions.codalab.org/competitions/20022
△ Less
Submitted 25 May, 2019;
originally announced June 2019.
-
Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search
Authors:
Pepa Atanasova,
Georgi Karadzhov,
Yasen Kiprov,
Preslav Nakov,
Fabrizio Sebastiani
Abstract:
In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, al…
▽ More
In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links'' metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Periodicity of magnetization reversals in $\varphi_0$ Josephson junction
Authors:
P. Kh. Atanasova,
S. A. Panayotova,
I. R. Rahmonov,
Yu. M. Shukrinov,
E. V. Zemlyanaya,
M. V. Bashashin
Abstract:
The magnetization reversal in ${\varphi_0}$-Josephson junction with direct coupling between magnetic moment and Josephson current has been studied. By adding pulse signal, the dynamics of magnetic moment components have been simulated and the full magnetization reversal at different parameters of the junction has been demonstrated. We obtain a detailed pictures representing the intervals of the da…
▽ More
The magnetization reversal in ${\varphi_0}$-Josephson junction with direct coupling between magnetic moment and Josephson current has been studied. By adding pulse signal, the dynamics of magnetic moment components have been simulated and the full magnetization reversal at different parameters of the junction has been demonstrated. We obtain a detailed pictures representing the intervals of the dam** parameter $α$, Josephson to magnetic energy relation $G$ and spin-orbit coupling parameter $r$ with full magnetization reversal. A periodicity in the appearance of magnetization reversal intervals with increase in Josephson to magnetic energy relation is found. The obtained results might be used in different fields of superconducting spintronics.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness
Authors:
Pepa Atanasova,
Alberto Barron-Cedeno,
Tamer Elsayed,
Reem Suwaileh,
Wajdi Zaghouani,
Spas Kyuchukov,
Giovanni Da San Martino,
Preslav Nakov
Abstract:
We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for…
▽ More
We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for fact checking. We offered the task in both English and Arabic, based on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign. A total of 30 teams registered to participate in the Lab and seven teams actually submitted systems for Task~1. The most successful approaches used by the participants relied on recurrent and multi-layer neural networks, as well as on combinations of distributional representations, on matchings claims' vocabulary against lexicons, and on measures of syntactic dependency. The best systems achieved mean average precision of 0.18 and 0.15 on the English and on the Arabic test datasets, respectively. This leaves large room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in check-worthiness estimation.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
Simulation of Collective Excitations in the Stack of Long Josephson Junctions
Authors:
Ilhom Rahmonov,
Yuri Shukrinov,
Pavlina Atanasova,
Elena Zemlyanaya,
Oksana Streltsova,
Maxim Zuev,
Andrej Plecenik,
Akinobu Irie
Abstract:
The phase dynamics of the stack of long Josephson junctions has been studied. Both inductive and capacitive couplings between Josephson junctions have been taken into account in the calculations. The IV--curve, bias current dependence of radiation power and dynamics of each JJs of the stack have been investigated. The coexistence of the charge traveling wave and fluxon states has been observed. Th…
▽ More
The phase dynamics of the stack of long Josephson junctions has been studied. Both inductive and capacitive couplings between Josephson junctions have been taken into account in the calculations. The IV--curve, bias current dependence of radiation power and dynamics of each JJs of the stack have been investigated. The coexistence of the charge traveling wave and fluxon states has been observed. This state can be considered as a new collective excitation in the system of coupled Josephson junctions. We demonstrate that the observed collective excitation leads to the decreasing of radiation power from the system.
△ Less
Submitted 30 August, 2017;
originally announced August 2017.
-
Numerical study of fluxon solutions of sine-Gordon equation under the influence of the boundary conditions
Authors:
P. Kh. Atanasova,
E. V. Zemlyanaya,
Yu. M. Shukrinov
Abstract:
The fluxon solutions of a boundary problem for the sine-Gordon equation (SGE) are investigated numerically in dependence on the boundary conditions. Interconnection between fluxon and constant solutions is analyzed. Numerical results are discussed in context of the long Josephson junction model.
The fluxon solutions of a boundary problem for the sine-Gordon equation (SGE) are investigated numerically in dependence on the boundary conditions. Interconnection between fluxon and constant solutions is analyzed. Numerical results are discussed in context of the long Josephson junction model.
△ Less
Submitted 15 July, 2011;
originally announced July 2011.
-
Influence of Josephson current second harmonic on stability of magnetic flux in long junctions
Authors:
P. Kh. Atanasova,
T. L. Boyadjiev,
Yu. M. Shukrinov,
E. V. Zemlyanaya,
P. Seidel
Abstract:
We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The dependence of the static magnetic flux distributions on parameters of the model are investigated numerically. Stability of the static solutions is checked by the sign of the smallest eigenvalue of the associated Sturm-Liouville problem. New solutions whic…
▽ More
We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The dependence of the static magnetic flux distributions on parameters of the model are investigated numerically. Stability of the static solutions is checked by the sign of the smallest eigenvalue of the associated Sturm-Liouville problem. New solutions which do not exist in the traditional model, have been found. Investigation of the influence of second harmonic on the stability of magnetic flux distributions for main solutions is performed.
△ Less
Submitted 27 July, 2010;
originally announced July 2010.
-
Numerical investigation of the second harmonic effects in the LJJ
Authors:
P. Kh. Atanasova,
T. L. Boyadjiev,
Yu. M. Shukrinov,
E. V. Zemlyanaya
Abstract:
We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The sign of second harmonic is important for many physical applications. The influence of the sign and value of the second harmonic on the magnetic flux distributions is investigated. At each step of numerical continuation in parameters of the model, the corr…
▽ More
We study the long Josephson junction (LJJ) model which takes into account the second harmonic of the Fourier expansion of Josephson current. The sign of second harmonic is important for many physical applications. The influence of the sign and value of the second harmonic on the magnetic flux distributions is investigated. At each step of numerical continuation in parameters of the model, the corresponding nonlinear boundary problem is solved on the basis of the continuous analog of Newton's method with the 4th order Numerov discretization scheme. New solutions which do not exist in the traditional model have been found. The influence of the second harmonic on stability of magnetic flux distributions for main solutions is investigated.
△ Less
Submitted 31 May, 2010;
originally announced May 2010.
-
Numerical study of magnetic flux in the LJJ model with double sine-Gordon equation
Authors:
P. Kh. Atanasova,
T. L. Boyadjiev,
Yu. M. Shukrinov,
E. V. Zemlyanaya
Abstract:
The decrease of the barrier transparency in superconductor-insulator-superconductor (SIS) Josephson junctions leads to the deviations of the current-phase relation from the sinusoidal form. The sign of second harmonics is important for many applications, in particular in junctions with a more complex structure like SNINS or SFIFS, where N is a normal metal and F is a weak metallic ferromagnet. In…
▽ More
The decrease of the barrier transparency in superconductor-insulator-superconductor (SIS) Josephson junctions leads to the deviations of the current-phase relation from the sinusoidal form. The sign of second harmonics is important for many applications, in particular in junctions with a more complex structure like SNINS or SFIFS, where N is a normal metal and F is a weak metallic ferromagnet. In our work we study the static magnetic flux distributions in long Josephson junctions taking into account the higher harmonics in the Fourier-decomposition of the Josephson current. Stability analysis is based on numerical solution of a spectral Sturm-Liouville problem formulated for each distribution. In this approach the nullification of the minimal eigenvalue of this problem indicates a bifurcation point in one of parameters. At each step of numerical continuation in parameters of the model, the corresponding nonlinear boundary problem is solved on the basis of the continuous analog of Newton's method. The solutions which do not exist in the traditional model have been found. The influence of second harmonic on stability of magnetic flux distributions for main solutions is investigated.
△ Less
Submitted 26 May, 2010;
originally announced May 2010.