Search | arXiv e-print repository

arXiv:2406.11093 [pdf, other]

RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

Authors: Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard Hovy

Abstract: Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focu… ▽ More Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focus on in-domain tasks and do not incorporate significant sentiment and emotion features (which we jointly call affect). In this paper, we propose RAEmoLLM, the first retrieval augmented (RAG) LLMs framework to address cross-domain misinformation detection using in-context learning based on affective information. It accomplishes this by applying an emotion-aware LLM to construct a retrieval database of affective embeddings. This database is used by our retrieval module to obtain source-domain samples, which are subsequently used for the inference module's in-context few-shot learning to detect target domain misinformation. We evaluate our framework on three misinformation benchmarks. Results show that RAEmoLLM achieves significant improvements compared to the zero-shot method on three datasets, with the highest increases of 20.69%, 23.94%, and 39.11% respectively. This work will be released on https://github.com/lzw108/RAEmoLLM. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2403.18933 [pdf, other]

SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

Abstract: We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. The… ▽ More We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks. △ Less

Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638

arXiv:2402.08638 [pdf, other]

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages

Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata , et al. (2 additional authors not shown)

Abstract: Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dat… ▽ More Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dataset collection annotated by native speakers across 13 languages: \textit{Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Spanish,} and \textit{Telugu}. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by a relatively limited availability of NLP resources. Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. The scores are obtained using a comparative annotation framework. We describe the data collection and annotation processes, challenges when building the datasets, baseline experiments, and their impact and utility in NLP. △ Less

Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted to the Findings of ACL 2024

arXiv:2212.08353 [pdf, other]

How to disagree well: Investigating the dispute tactics used on Wikipedia

Authors: Christine de Kock, Tom Stafford, Andreas Vlachos

Abstract: Disagreements are frequently studied from the perspective of either detecting toxicity or analysing argument structure. We propose a framework of dispute tactics that unifies these two perspectives, as well as other dialogue acts which play a role in resolving disputes, such as asking questions and providing clarification. This framework includes a preferential ordering among rebuttal-type tactics… ▽ More Disagreements are frequently studied from the perspective of either detecting toxicity or analysing argument structure. We propose a framework of dispute tactics that unifies these two perspectives, as well as other dialogue acts which play a role in resolving disputes, such as asking questions and providing clarification. This framework includes a preferential ordering among rebuttal-type tactics, ranging from ad hominem attacks to refuting the central argument. Using this framework, we annotate 213 disagreements (3,865 utterances) from Wikipedia Talk pages. This allows us to investigate research questions around the tactics used in disagreements; for instance, we provide empirical validation of the approach to disagreement recommended by Wikipedia. We develop models for multilabel prediction of dispute tactics in an utterance, achieving the best performance with a transformer-based label powerset model. Adding an auxiliary task to incorporate the ordering of rebuttal tactics further yields a statistically significant increase. Finally, we show that these annotations can be used to provide useful additional signals to improve performance on the task of predicting escalation. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: Accepted to EMNLP 2022 (Long paper)

Journal ref: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

arXiv:2101.10917 [pdf, other]

I Beg to Differ: A study of constructive disagreement in online conversations

Authors: Christine de Kock, Andreas Vlachos

Abstract: Disagreements are pervasive in human communication. In this paper we investigate what makes disagreement constructive. To this end, we construct WikiDisputes, a corpus of 7 425 Wikipedia Talk page conversations that contain content disputes, and define the task of predicting whether disagreements will be escalated to mediation by a moderator. We evaluate feature-based models with linguistic marker… ▽ More Disagreements are pervasive in human communication. In this paper we investigate what makes disagreement constructive. To this end, we construct WikiDisputes, a corpus of 7 425 Wikipedia Talk page conversations that contain content disputes, and define the task of predicting whether disagreements will be escalated to mediation by a moderator. We evaluate feature-based models with linguistic markers from previous work, and demonstrate that their performance is improved by using features that capture changes in linguistic markers throughout the conversations, as opposed to averaged values. We develop a variety of neural models and show that taking into account the structure of the conversation improves predictive accuracy, exceeding that of feature-based models. We assess our best neural model in terms of both predictive accuracy and uncertainty by evaluating its behaviour when it is only exposed to the beginning of the conversation, finding that model accuracy improves and uncertainty reduces as models are exposed to more information. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: Accepted to appear in EACL 2021

arXiv:1909.03083 [pdf]

A community-based transcriptomics classification and nomenclature of neocortical cell types

Authors: Rafael Yuste, Michael Hawrylycz, Nadia Aalling, Detlev Arendt, Ruben Armananzas, Giorgio Ascoli, Concha Bielza, Vahid Bokharaie, Tobias Bergmann, Irina Bystron, Marco Capogna, Yoonjeung Chang, Ann Clemens, Christiaan de Kock, Javier DeFelipe, Sandra Dos Santos, Keagan Dunville, Dirk Feldmeyer, Richard Fiath, Gordon Fishell, Angelica Foggetti, Xuefan Gao, Parviz Ghaderi, Onur Gunturkun, Vanessa Jane Hall , et al. (46 additional authors not shown)

Abstract: To understand the function of cortical circuits it is necessary to classify their underlying cellular diversity. Traditional attempts based on comparing anatomical or physiological features of neurons and glia, while productive, have not resulted in a unified taxonomy of neural cell types. The recent development of single-cell transcriptomics has enabled, for the first time, systematic high-throug… ▽ More To understand the function of cortical circuits it is necessary to classify their underlying cellular diversity. Traditional attempts based on comparing anatomical or physiological features of neurons and glia, while productive, have not resulted in a unified taxonomy of neural cell types. The recent development of single-cell transcriptomics has enabled, for the first time, systematic high-throughput profiling of large numbers of cortical cells and the generation of datasets that hold the promise of being complete, accurate and permanent. Statistical analyses of these data have revealed the existence of clear clusters, many of which correspond to cell types defined by traditional criteria, and which are conserved across cortical areas and species. To capitalize on these innovations and advance the field, we, the Copenhagen Convention Group, propose the community adopts a transcriptome-based taxonomy of the cell types in the adult mammalian neocortex. This core classification should be ontological, hierarchical and use a standardized nomenclature. It should be configured to flexibly incorporate new data from multiple approaches, developmental stages and a growing number of species, enabling improvement and revision of the classification. This community-based strategy could serve as a common foundation for future detailed analysis and reverse engineering of cortical circuits and serve as an example for cell type classification in other parts of the nervous system and other organs. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: 21 pages, 3 figures

arXiv:1608.00045 [pdf, other]

doi 10.1117/12.2230763

Gaia: focus, straylight and basic angle

Authors: A. Mora, M. Biermann, A. Bombrun, J. Boyadian, F. Chassat, P. Corberand, M. Davidson, D. Doyle, D. Escolar, W. L. M. Gielesen, T. Guilpain, J. Hernandez, V. Kirschner, S. A. Klioner, C. Koeck, B. Laine, L. Lindegren, E. Serpell, P. Tatry, P Thoral

Abstract: The Gaia all-sky astrometric survey is challenged by several issues affecting the spacecraft stability. Amongst them, we find the focus evolution, straylight and basic angle variations Contrary to pre-launch expectations, the image quality is continuously evolving, during commissioning and the nominal mission. Payload decontaminations and wavefront sensor assisted refocuses have been carried out… ▽ More The Gaia all-sky astrometric survey is challenged by several issues affecting the spacecraft stability. Amongst them, we find the focus evolution, straylight and basic angle variations Contrary to pre-launch expectations, the image quality is continuously evolving, during commissioning and the nominal mission. Payload decontaminations and wavefront sensor assisted refocuses have been carried out to recover optimum performance. Straylight and basic angle variations several orders of magnitude greater than foreseen were found and studied during commissioning by the Gaia scientists (payload experts). Building on their investigations, an ESA-Airbus DS working group was established during the early nominal mission and worked on a detailed root cause analysis. In parallel, Gaia scientists have also continued analysing the data, most notably comparing the BAM signal to global astrometric solutions, with remarkable agreement. In this contribution, a status review of these issues will be provided, with emphasis on the mitigation schemes and the lessons learned for future space missions where extreme stability is a key requirement. △ Less

Submitted 21 August, 2016; v1 submitted 29 July, 2016; originally announced August 2016.

Comments: 17 pages, 17 figures. To appear in SPIE proceedings 9904-78. Space Telescopes and Instrumentation 2016: Optical, Infrared, and Millimeter Wave. Version 2: additional details on commissioning investigations added

arXiv:astro-ph/0610062 [pdf, ps, other]

doi 10.1117/12.672261

DUNE: The Dark Universe Explorer

Authors: A. Refregier, O. Boulade, Y. Mellier, B. Milliard, R. Pain, J. Michaud, F. Safa, A. Amara, P. Astier, E. Barrelet, E. Bertin, S. Boulade, C. Cara, A. Claret, L. Georges, R. Grange, J. Guy, C. Koeck, L. Kroely, C. Magneville, N. Palanque-Delabrouille, N. Regnault, G. Smadja, C. Schimd, Z. Sun

Abstract: Understanding the nature of Dark Matter and Dark Energy is one of the most pressing issues in cosmology and fundamental physics. The purpose of the DUNE (Dark UNiverse Explorer) mission is to study these two cosmological components with high precision, using a space-based weak lensing survey as its primary science driver. Weak lensing provides a measure of the distribution of dark matter in the… ▽ More Understanding the nature of Dark Matter and Dark Energy is one of the most pressing issues in cosmology and fundamental physics. The purpose of the DUNE (Dark UNiverse Explorer) mission is to study these two cosmological components with high precision, using a space-based weak lensing survey as its primary science driver. Weak lensing provides a measure of the distribution of dark matter in the universe and of the impact of dark energy on the growth of structures. DUNE will also include a complementary supernovae survey to measure the expansion history of the universe, thus giving independent additional constraints on dark energy. The baseline concept consists of a 1.2m telescope with a 0.5 square degree optical CCD camera. It is designed to be fast with reduced risks and costs, and to take advantage of the synergy between ground-based and space observations. Stringent requirements for weak lensing systematics were shown to be achievable with the baseline concept. This will allow DUNE to place strong constraints on cosmological parameters, including the equation of state parameter of the dark energy and its evolution from redshift 0 to 1. DUNE is the subject of an ongoing study led by the French Space Agency (CNES), and is being proposed for ESA's Cosmic Vision programme. △ Less

Submitted 3 October, 2006; originally announced October 2006.

Comments: 12 latex pages, including 7 figures and 2 tables. Procs. of SPIE symposium "Astronomical Telescopes and Instrumentation", Orlando, may 2006

Showing 1–8 of 8 results for author: Koeck, C