-
Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency
Authors:
Toyin Aguda,
Suchetha Siddagangappa,
Elena Kochkina,
Simerjot Kaur,
Dongsheng Wang,
Charese Smiley,
Sameena Shah
Abstract:
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data a…
▽ More
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data annotators for extracting relations in financial documents. We compare the annotations produced by three LLMs (GPT-4, PaLM 2, and MPT Instruct) against expert annotators and crowdworkers. We demonstrate that the current state-of-the-art LLMs can be sufficient alternatives to non-expert crowdworkers. We analyze models using various prompts and parameter settings and find that customizing the prompts for each relation group by providing specific examples belonging to those groups is paramount. Furthermore, we introduce a reliability index (LLM-RelIndex) used to identify outputs that may require expert attention. Finally, we perform an extensive time, cost and error analysis and provide recommendations for the collection and usage of automated annotations in domain-specific settings.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling
Authors:
Talia Tseriotou,
Ryan Sze-Yin Chan,
Adam Tsakalidis,
Iman Munire Bilal,
Elena Kochkina,
Terry Lyons,
Maria Liakata
Abstract:
We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building block…
▽ More
We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.
△ Less
Submitted 6 February, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Some Observations on Fact-Checking Work with Implications for Computational Support
Authors:
Rob Procter,
Miguel Arana-Catania,
Yulan He,
Maria Liakata,
Arkaitz Zubiaga,
Elena Kochkina,
Runcong Zhao
Abstract:
Social media and user-generated content (UGC) have become increasingly important features of journalistic work in a number of different ways. However, the growth of misinformation means that news organisations have had devote more and more resources to determining its veracity and to publishing corrections if it is found to be misleading. In this work, we present the results of interviews with eig…
▽ More
Social media and user-generated content (UGC) have become increasingly important features of journalistic work in a number of different ways. However, the growth of misinformation means that news organisations have had devote more and more resources to determining its veracity and to publishing corrections if it is found to be misleading. In this work, we present the results of interviews with eight members of fact-checking teams from two organisations. Team members described their fact-checking processes and the challenges they currently face in completing a fact-check in a robust and timely way. The former reveals, inter alia, significant differences in fact-checking practices and the role played by collaboration between team members. We conclude with a discussion of the implications for the development and application of computational tools, including where computational tool support is currently lacking and the importance of being able to accommodate different fact-checking practices.
△ Less
Submitted 6 July, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
PANACEA: An Automated Misinformation Detection System on COVID-19
Authors:
Runcong Zhao,
Miguel Arana-Catania,
Lixing Zhu,
Elena Kochkina,
Lin Gui,
Arkaitz Zubiaga,
Rob Procter,
Maria Liakata,
Yulan He
Abstract:
In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporti…
▽ More
In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
PHEMEPlus: Enriching Social Media Rumour Verification with External Evidence
Authors:
John Dougrez-Lewis,
Elena Kochkina,
M. Arana-Catania,
Maria Liakata,
Yulan He
Abstract:
Work on social media rumour verification utilises signals from posts, their propagation and users involved. Other lines of work target identifying and fact-checking claims based on information from Wikipedia, or trustworthy news articles without considering social media context. However works combining the information from social media with external evidence from the wider web are lacking. To faci…
▽ More
Work on social media rumour verification utilises signals from posts, their propagation and users involved. Other lines of work target identifying and fact-checking claims based on information from Wikipedia, or trustworthy news articles without considering social media context. However works combining the information from social media with external evidence from the wider web are lacking. To facilitate research in this direction, we release a novel dataset, PHEMEPlus, an extension of the PHEME benchmark, which contains social media conversations as well as relevant external evidence for each rumour. We demonstrate the effectiveness of incorporating such evidence in improving rumour verification models. Additionally, as part of the evidence collection, we evaluate various ways of query formulation to identify the most effective method.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers
Authors:
Rabab Alkhalifa,
Elena Kochkina,
Arkaitz Zubiaga
Abstract:
Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task…
▽ More
Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.
△ Less
Submitted 19 November, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims
Authors:
M. Arana-Catania,
Elena Kochkina,
Arkaitz Zubiaga,
Maria Liakata,
Rob Procter,
Yulan He
Abstract:
We present a comprehensive work on automated veracity assessment from dataset creation to develo** novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction inc…
▽ More
We present a comprehensive work on automated veracity assessment from dataset creation to develo** novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction includes work on retrieval techniques and similarity measurements to ensure a unique set of claims. We then propose novel techniques for automated veracity assessment based on Natural Language Inference including graph convolutional networks and attention based approaches. We have carried out experiments on evidence retrieval and veracity assessment on the dataset using the proposed techniques and found them competitive with SOTA methods, and provided a detailed discussion.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Opinions are Made to be Changed: Temporally Adaptive Stance Classification
Authors:
Rabab Alkhalifa,
Elena Kochkina,
Arkaitz Zubiaga
Abstract:
Given the rapidly evolving nature of social media and people's views, word usage changes over time. Consequently, the performance of a classifier trained on old textual data can drop dramatically when tested on newer data. While research in stance classification has advanced in recent years, no effort has been invested in making these classifiers have persistent performance over time. To study thi…
▽ More
Given the rapidly evolving nature of social media and people's views, word usage changes over time. Consequently, the performance of a classifier trained on old textual data can drop dramatically when tested on newer data. While research in stance classification has advanced in recent years, no effort has been invested in making these classifiers have persistent performance over time. To study this phenomenon we introduce two novel large-scale, longitudinal stance datasets. We then evaluate the performance persistence of stance classifiers over time and demonstrate how it decays as the temporal gap between training and testing data increases. We propose a novel approach to mitigate this performance drop, which is based on temporal adaptation of the word embeddings used for training the stance classifier. This enables us to make use of readily available unlabelled data from the current time period instead of expensive annotation efforts. We propose and compare several approaches to embedding adaptation and find that the Incremental Temporal Alignment (ITA) model leads to the best results in reducing performance drop over time.
△ Less
Submitted 27 August, 2021;
originally announced August 2021.
-
Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies
Authors:
Gabriele Pergola,
Elena Kochkina,
Lin Gui,
Maria Liakata,
Yulan He
Abstract:
Biomedical question-answering (QA) has gained increased attention for its capability to provide users with high-quality information from a vast scientific literature. Although an increasing number of biomedical QA datasets has been recently made available, those resources are still rather limited and expensive to produce. Transfer learning via pre-trained language models (LMs) has been shown as a…
▽ More
Biomedical question-answering (QA) has gained increased attention for its capability to provide users with high-quality information from a vast scientific literature. Although an increasing number of biomedical QA datasets has been recently made available, those resources are still rather limited and expensive to produce. Transfer learning via pre-trained language models (LMs) has been shown as a promising approach to leverage existing general-purpose knowledge. However, finetuning these large models can be costly and time consuming, often yielding limited benefits when adapting to specific themes of specialised domains, such as the COVID-19 literature. To bootstrap further their domain adaptation, we propose a simple yet unexplored approach, which we call biomedical entity-aware masking (BEM). We encourage masked language models to learn entity-centric knowledge based on the pivotal entities characterizing the domain at hand, and employ those entities to drive the LM fine-tuning. The resulting strategy is a downstream process applicable to a wide variety of masked LMs, not requiring additional memory or components in the neural architectures. Experimental results show performance on par with state-of-the-art models on several biomedical QA datasets.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions
Authors:
Rabab Alkhalifa,
Theodore Yoong,
Elena Kochkina,
Arkaitz Zubiaga,
Maria Liakata
Abstract:
This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We d…
▽ More
This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We describe and analyse the results of our submissions. We show that a CNN using COVID-Twitter-BERT (CT-BERT) enhanced with numeric expressions can effectively boost performance from baseline results. We also show results of training data augmentation with rumours on other topics. Our best system ranked fourth in the task with encouraging outcomes showing potential for improved results in the future.
△ Less
Submitted 30 August, 2020;
originally announced August 2020.
-
Estimating predictive uncertainty for rumour verification models
Authors:
Elena Kochkina,
Maria Liakata
Abstract:
The inability to correctly resolve rumours circulating online can have harmful real-world consequences. We present a method for incorporating model and data uncertainty estimates into natural language processing models for automatic rumour verification. We show that these estimates can be used to filter out model predictions likely to be erroneous, so that these difficult instances can be prioriti…
▽ More
The inability to correctly resolve rumours circulating online can have harmful real-world consequences. We present a method for incorporating model and data uncertainty estimates into natural language processing models for automatic rumour verification. We show that these estimates can be used to filter out model predictions likely to be erroneous, so that these difficult instances can be prioritised by a human fact-checker. We propose two methods for uncertainty-based instance rejection, supervised and unsupervised. We also show how uncertainty estimates can be used to interpret model performance as a rumour unfolds.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data
Authors:
Harish Tayyar Madabushi,
Elena Kochkina,
Michael Castelle
Abstract:
The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be fine-tuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news document…
▽ More
The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be fine-tuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news documents and other forms of decontextualized social communication (e.g. sentiment analysis), inherently deals with data whose categories are simultaneously imbalanced and dissimilar. We show that BERT, while capable of handling imbalanced classes with no additional data augmentation, does not generalise well when the training and test data are sufficiently dissimilar (as is often the case with news sources, whose topics evolve over time). We show how to address this problem by providing a statistical measure of similarity between datasets and a method of incorporating cost-weighting into BERT when the training and test sets are dissimilar. We test these methods on the Propaganda Techniques Corpus (PTC) and achieve the second-highest score on sentence-level propaganda classification.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
RumourEval 2019: Determining Rumour Veracity and Support for Rumours
Authors:
Genevieve Gorrell,
Kalina Bontcheva,
Leon Derczynski,
Elena Kochkina,
Maria Liakata,
Arkaitz Zubiaga
Abstract:
This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event. Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the dangers of "fake news" have become a mainstream concern. Yet automated support for rumour checking remains in its infancy. For this reason, it is important that a shared task…
▽ More
This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event. Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the dangers of "fake news" have become a mainstream concern. Yet automated support for rumour checking remains in its infancy. For this reason, it is important that a shared task in this area continues to provide a focus for effort, which is likely to increase. We therefore propose a continuation in which the veracity of further rumours is determined, and as previously, supportive of this goal, tweets discussing them are classified according to the stance they take regarding the rumour. Scope is extended compared with the first RumourEval, in that the dataset is substantially expanded to include Reddit as well as Twitter data, and additional languages are also included.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
All-in-one: Multi-task Learning for Rumour Verification
Authors:
Elena Kochkina,
Maria Liakata,
Arkaitz Zubiaga
Abstract:
Automatic resolution of rumours is a challenging task that can be broken down into smaller components that make up a pipeline, including rumour detection, rumour tracking and stance classification, leading to the final outcome of determining the veracity of a rumour. In previous work, these steps in the process of rumour verification have been developed as separate components where the output of o…
▽ More
Automatic resolution of rumours is a challenging task that can be broken down into smaller components that make up a pipeline, including rumour detection, rumour tracking and stance classification, leading to the final outcome of determining the veracity of a rumour. In previous work, these steps in the process of rumour verification have been developed as separate components where the output of one feeds into the next. We propose a multi-task learning approach that allows joint training of the main and auxiliary tasks, improving the performance of rumour verification. We examine the connection between the dataset properties and the outcomes of the multi-task learning models used.
△ Less
Submitted 10 June, 2018;
originally announced June 2018.
-
Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers
Authors:
Arkaitz Zubiaga,
Elena Kochkina,
Maria Liakata,
Rob Procter,
Michal Lukasik,
Kalina Bontcheva,
Trevor Cohn,
Isabelle Augenstein
Abstract:
Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse featu…
▽ More
Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse features inherent in social media interactions or 'conversational threads'. Testing the effectiveness of four sequential classifiers -- Hawkes Processes, Linear-Chain Conditional Random Fields (Linear CRF), Tree-Structured Conditional Random Fields (Tree CRF) and Long Short Term Memory networks (LSTM) -- on eight datasets associated with breaking news stories, and looking at different types of local and contextual features, our work sheds new light on the development of accurate stance classifiers. We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers. Furthermore, we show that LSTM using a reduced set of features can outperform the other sequential classifiers; this performance is consistent across datasets and across types of stances. To conclude, our work also analyses the different features under study, identifying those that best help characterise and distinguish between stances, such as supporting tweets being more likely to be accompanied by evidence than denying tweets. We also set forth a number of directions for future research.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM
Authors:
Elena Kochkina,
Maria Liakata,
Isabelle Augenstein
Abstract:
This paper describes team Turing's submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A). Subtask A addresses the challenge of rumour stance classification, which involves identifying the attitude of Twitter users towards the truthfulness of the rumour they are discussing. Stance classification is considered to be an important s…
▽ More
This paper describes team Turing's submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A). Subtask A addresses the challenge of rumour stance classification, which involves identifying the attitude of Twitter users towards the truthfulness of the rumour they are discussing. Stance classification is considered to be an important step towards rumour verification, therefore performing well in this task is expected to be useful in debunking false rumours. In this work we classify a set of Twitter posts discussing rumours into either supporting, denying, questioning or commenting on the underlying rumours. We propose a LSTM-based sequential model that, through modelling the conversational structure of tweets, which achieves an accuracy of 0.784 on the RumourEval test set outperforming all other systems in Subtask A.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations
Authors:
Arkaitz Zubiaga,
Elena Kochkina,
Maria Liakata,
Rob Procter,
Michal Lukasik
Abstract:
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by…
▽ More
Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by harvesting users' replies to one another, which results in a nested tree-like structure. Previous work addressing the stance classification task has treated each tweet as a separate unit. Here we analyse tweets by virtue of their position in a sequence and test two sequential classifiers, Linear-Chain CRF and Tree CRF, each of which makes different assumptions about the conversational structure. We experiment with eight Twitter datasets, collected during breaking news, and show that exploiting the sequential structure of Twitter conversations achieves significant improvements over the non-sequential methods. Our work is the first to model Twitter conversations as a tree structure in this manner, introducing a novel way of tackling NLP tasks on Twitter conversations.
△ Less
Submitted 11 October, 2016; v1 submitted 28 September, 2016;
originally announced September 2016.
-
The Gravitational Universe
Authors:
The eLISA Consortium,
:,
P. Amaro Seoane,
S. Aoudia,
H. Audley,
G. Auger,
S. Babak,
J. Baker,
E. Barausse,
S. Barke,
M. Bassan,
V. Beckmann,
M. Benacquista,
P. L. Bender,
E. Berti,
P. Binétruy,
J. Bogenstahl,
C. Bonvin,
D. Bortoluzzi,
N. C. Brause,
J. Brossard,
S. Buchman,
I. Bykov,
J. Camp,
C. Caprini
, et al. (136 additional authors not shown)
Abstract:
The last century has seen enormous progress in our understanding of the Universe. We know the life cycles of stars, the structure of galaxies, the remnants of the big bang, and have a general understanding of how the Universe evolved. We have come remarkably far using electromagnetic radiation as our tool for observing the Universe. However, gravity is the engine behind many of the processes in th…
▽ More
The last century has seen enormous progress in our understanding of the Universe. We know the life cycles of stars, the structure of galaxies, the remnants of the big bang, and have a general understanding of how the Universe evolved. We have come remarkably far using electromagnetic radiation as our tool for observing the Universe. However, gravity is the engine behind many of the processes in the Universe, and much of its action is dark. Opening a gravitational window on the Universe will let us go further than any alternative. Gravity has its own messenger: Gravitational waves, ripples in the fabric of spacetime. They travel essentially undisturbed and let us peer deep into the formation of the first seed black holes, exploring redshifts as large as z ~ 20, prior to the epoch of cosmic re-ionisation. Exquisite and unprecedented measurements of black hole masses and spins will make it possible to trace the history of black holes across all stages of galaxy evolution, and at the same time constrain any deviation from the Kerr metric of General Relativity. eLISA will be the first ever mission to study the entire Universe with gravitational waves. eLISA is an all-sky monitor and will offer a wide view of a dynamic cosmos using gravitational waves as new and unique messengers to unveil The Gravitational Universe. It provides the closest ever view of the early processes at TeV energies, has guaranteed sources in the form of verification binaries in the Milky Way, and can probe the entire Universe, from its smallest scales around singularities and black holes, all the way to cosmological dimensions.
△ Less
Submitted 24 May, 2013;
originally announced May 2013.