Skip to main content

Showing 1–50 of 204 results for author: Nakov, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  2. arXiv:2406.15627  [pdf, other

    cs.CL cs.LG

    Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

    Authors: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Lyudmila Rvanova, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov

    Abstract: Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML). The rapid proliferation of large language models (LLMs) has stimulated researchers to seek efficient and effective approaches to UQ in text generation tasks, as in addition to their emerging capabilities, these models have introduced new challenges for bui… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev contributed equally

  3. arXiv:2406.11250  [pdf, other

    cs.CL

    Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs

    Authors: Muhammad Arslan Manzoor, Yuxia Wang, Minghan Wang, Preslav Nakov

    Abstract: Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages

  4. arXiv:2406.11073  [pdf, other

    cs.CL

    Exploring the Limitations of Detecting Machine-Generated Text

    Authors: Jad Doughman, Osama Mohammed Afzal, Hawau Olamide Toyin, Shady Shehata, Preslav Nakov, Zeerak Talat

    Abstract: Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Systems proposed for the task often achieve high performance. However, humans and machines can produce text in different styles and in different domains, and it remains unclear whether machine generated-text detection models favour particular styles or domai… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.05087  [pdf, other

    cs.IR

    Corpus Poisoning via Approximate Greedy Gradient Descent

    Authors: **yan Su, John X. Morris, Preslav Nakov, Claire Cardie

    Abstract: Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e.g., Retrieval-Augmented Generation (RAG) systems. Unfortunately, they have recently been shown to be vulnerable to corpus poisoning attacks in which a malicious user injects a small fraction of adversarial passages into the retrieval corpu… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  6. arXiv:2406.03181  [pdf, other

    cs.CL

    Missci: Reconstructing Fallacies in Misrepresented Science

    Authors: Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

    Abstract: Health-related misinformation on social networks can lead to poor decision-making and real-world dangers. Such misinformation often misrepresents scientific publications and cites them as "proof" to gain perceived credibility. To effectively counter such claims automatically, a system must explain how the claim was falsely derived from the cited publication. Current methods for automated fact-chec… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (main)

  7. arXiv:2405.11215  [pdf, other

    cs.CL cs.CY

    MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

    Authors: Siddhant Agarwal, Shivam Sharma, Preslav Nakov, Tanmoy Chakraborty

    Abstract: Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this researc… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: The paper has been accepted in ACL'24 (Findings)

  8. arXiv:2405.05583  [pdf, other

    cs.CL

    OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

    Authors: Yuxia Wang, Minghan Wang, Hasan Iqbal, Georgi Georgiev, Jiahui Geng, Preslav Nakov

    Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigat… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 8 tables, 8 figures

  9. arXiv:2404.17342  [pdf, other

    cs.CL cs.AI

    Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

    Authors: Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash

    Abstract: The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Paper 8 pages, Appendix 12 pages. Submitted to ARR

  10. arXiv:2404.14183  [pdf, other

    cs.CL

    SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  11. arXiv:2403.17068  [pdf, other

    cs.CR

    Semantic Ranking for Automated Adversarial Technique Annotation in Security Text

    Authors: Udesh Kumarasinghe, Ahmed Lekssays, Husrev Taha Sencar, Sabri Boughorbel, Charitha Elvitigala, Preslav Nakov

    Abstract: We introduce a new method for extracting structured threat behaviors from threat intelligence text. Our method is based on a multi-stage ranking architecture that allows jointly optimizing for efficiency and effectiveness. Therefore, we believe this problem formulation better aligns with the real-world nature of the task considering the large number of adversary techniques and the extensive body o… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  12. arXiv:2403.10378  [pdf, other

    cs.CL cs.CV

    EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

    Authors: Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov

    Abstract: We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images,… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  13. arXiv:2403.04696  [pdf, other

    cs.CL cs.AI cs.LG

    Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

    Authors: Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

    Abstract: Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factually correct, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide an… ▽ More

    Submitted 6 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL-2024 (Findings). Ekaterina Fadeeva, Aleksandr Rubashevskii, and Artem Shelmanov contributed equally

  14. arXiv:2403.03627  [pdf, other

    cs.CL cs.AI

    Multimodal Large Language Models to Support Real-World Fact-Checking

    Authors: Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych

    Abstract: Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. Here is aim to bridge this gap. In particular, we propose a framework for systematically assessing the capacity of current multimodal models to facilitat… ▽ More

    Submitted 26 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  15. arXiv:2402.12840  [pdf, other

    cs.CL

    ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

    Authors: Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin

    Abstract: The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present ArabicMMLU, the first mu… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  16. arXiv:2402.12193  [pdf, other

    cs.CL

    A Chinese Dataset for Evaluating the Safeguards in Large Language Models

    Authors: Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, **gru Zhao, Preslav Nakov, Timothy Baldwin

    Abstract: Many studies have demonstrated that large language models (LLMs) can produce harmful responses, exposing users to unexpected risks when LLMs are deployed. Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs. However, the focus has been almost exclusively on English, and little… ▽ More

    Submitted 26 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 14 pages

  17. arXiv:2402.11175  [pdf, other

    cs.CL

    M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 29 pages

    Journal ref: ACL 2024 main

  18. arXiv:2402.02420  [pdf, other

    cs.CL cs.AI

    Factuality of Large Language Models in the Year 2024

    Authors: Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov

    Abstract: Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicabili… ▽ More

    Submitted 9 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 9 pages, 1 figure and 2 tables

  19. arXiv:2401.12713  [pdf, other

    cs.CL

    Generating Zero-shot Abstractive Explanations for Rumour Verification

    Authors: Iman Munire Bilal, Preslav Nakov, Rob Procter, Maria Liakata

    Abstract: The task of rumour verification in social media concerns assessing the veracity of a claim on the basis of conversation threads that result from it. While previous work has focused on predicting a veracity label, here we reformulate the task to generate model-centric free-text explanations of a rumour's veracity. The approach is model agnostic in that it generalises to any model. Here we propose a… ▽ More

    Submitted 23 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Revised version of the original

  20. arXiv:2312.06550  [pdf, other

    cs.CL cs.AI cs.LG

    LLM360: Towards Fully Transparent Open-Source LLMs

    Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Li** Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

    Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  21. arXiv:2311.09552  [pdf, other

    cs.CL

    Large Language Models are Few-Shot Training Example Generators: A Case Study in Fallacy Recognition

    Authors: Tariq Alhindi, Smaranda Muresan, Preslav Nakov

    Abstract: Recognizing fallacies is crucial for ensuring the quality and validity of arguments across various domains. However, computational fallacy recognition faces challenges due to the diverse genres, domains, and types of fallacies found in datasets. This leads to a highly multiclass, and even multi-label, setup with substantial class imbalance. In this study, we aim to enhance existing models for fall… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  22. arXiv:2311.09000  [pdf, other

    cs.CL

    Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers

    Authors: Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov

    Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factu… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 30 pages, 13 figures

  23. arXiv:2311.08298  [pdf, other

    cs.CL cs.AI

    A Survey of Confidence Estimation and Calibration in Large Language Models

    Authors: Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations. There has been a lot of recent re… ▽ More

    Submitted 25 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 16 pages, 1 page, 1 table

  24. arXiv:2311.06649  [pdf, other

    cs.CL

    A Template Is All You Meme

    Authors: Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych

    Abstract: Memes are a modern form of communication and meme templates possess a base semantics that is customizable by whomever posts it on social media. Machine learning systems struggle with memes, which is likely due to such systems having insufficient context to understand memes, as there is more to memes than the obvious image and text. Here, to aid understanding of memes, we release a knowledge base o… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 9 pages, 11 supplemental pages, 6 Tables, 10 Figures

  25. arXiv:2311.04917  [pdf, other

    cs.CL cs.AI

    Adapting Fake News Detection to the Era of Large Language Models

    Authors: **yan Su, Claire Cardie, Preslav Nakov

    Abstract: In the age of large language models (LLMs) and the widespread adoption of AI-driven content creation, the landscape of information dissemination has witnessed a paradigm shift. With the proliferation of both human-written and machine-generated real and fake news, robustly and effectively discerning the veracity of news articles has become an intricate challenge. While substantial research has been… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accept to NAACL 2024 Findings

  26. arXiv:2311.03179  [pdf, other

    cs.CL cs.AI

    ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection in Arabic Text

    Authors: Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat

    Abstract: We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023. ArAIEval offers two tasks over Arabic text: (i) persuasion technique detection, focusing on identifying persuasion techniques in tweets and news articles, and (ii) disinformation detection in binary and multiclass setups over tweets. A total of 20 teams participa… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ArabicNLP-23 (EMNLP-23), propaganda, disinformation, misinformation, fake news

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  27. arXiv:2310.18205  [pdf, other

    cs.CL

    Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media

    Authors: Shubham Mittal, Megha Sundriyal, Preslav Nakov

    Abstract: Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to identify text segments that contain a checkworthy claim or assertion in a social media post. Despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by c… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 (main)

  28. arXiv:2310.16995  [pdf, other

    cs.CL

    Quality > Quantity: Synthetic Corpora from Foundation Models for Closed-Domain Extractive Question Answering

    Authors: Saptarshi Sengupta, Connor Heaton, Shreya Ghosh, Preslav Nakov, Prasenjit Mitra

    Abstract: Domain adaptation, the process of training a model in one domain and applying it to another, has been extensively explored in machine learning. While training a domain-specific foundation model (FM) from scratch is an option, recent methods have focused on adapting pre-trained FMs for domain-specific tasks. However, our experiments reveal that either approach does not consistently achieve state-of… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  29. arXiv:2310.14338  [pdf, other

    cs.CL cs.AI

    From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

    Authors: Megha Sundriyal, Tanmoy Chakraborty, Preslav Nakov

    Abstract: With the rise of social media, users are exposed to many misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the important claims from such posts is arduous and time-consuming, yet it is an underexplored problem. Here, we aim to bridge this gap. We introduce a novel task, Clai… ▽ More

    Submitted 12 February, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings EMNLP2023

  30. arXiv:2310.07609  [pdf, other

    cs.CL

    QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking

    Authors: Liangming Pan, Xinyuan Lu, Min-Yen Kan, Preslav Nakov

    Abstract: Fact-checking real-world claims often requires complex, multi-step reasoning due to the absence of direct evidence to support or refute them. However, existing fact-checking systems often lack transparency in their decision-making, making it challenging for users to comprehend their reasoning process. To address this, we propose the Question-guided Multi-hop Fact-Checking (QACHECK) system, which g… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 System Demonstrations Track

  31. arXiv:2310.05189  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Factuality Challenges in the Era of Large Language Models

    Authors: Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni

    Abstract: The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations.… ▽ More

    Submitted 9 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Our article offers a comprehensive examination of the challenges and risks associated with Large Language Models (LLMs), focusing on their potential impact on the veracity of information in today's digital landscape

  32. arXiv:2309.08969  [pdf, other

    cs.CL

    Rethinking STS and NLI in Large Language Models

    Authors: Yuxia Wang, Minghan Wang, Preslav Nakov

    Abstract: Recent years have seen the rise of large language models (LLMs), where practitioners use task-specific prompts; this was shown to be effective for a variety of tasks. However, when applied to semantic textual similarity (STS) and natural language inference (NLI), the effectiveness of LLMs turns out to be limited by low-resource domain accuracy, model overconfidence, and difficulty to capture the d… ▽ More

    Submitted 4 February, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2212.13138 by other authors

  33. arXiv:2309.08674  [pdf, other

    cs.CL cs.AI

    Fake News Detectors are Biased against Texts Generated by Large Language Models

    Authors: **yan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov

    Abstract: The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: The first two authors contributed equally

  34. arXiv:2309.06844  [pdf, other

    cs.CL cs.AI cs.MM

    Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles

    Authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov

    Abstract: The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different rese… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  35. arXiv:2308.16149  [pdf, other

    cs.CL cs.AI cs.LG

    Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Authors: Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock , et al. (7 additional authors not shown)

    Abstract: We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-chat

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  36. arXiv:2308.13387  [pdf, other

    cs.CL

    Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

    Authors: Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, Timothy Baldwin

    Abstract: With the rapid evolution of large language models (LLMs), new and hard-to-predict harmful capabilities are emerging. This requires developers to be able to identify risks through the evaluation of "dangerous capabilities" in order to responsibly deploy LLMs. In this work, we collect the first open-source dataset to evaluate safeguards in LLMs, and deploy safer open-source LLMs at a low cost. Our d… ▽ More

    Submitted 3 September, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

    Comments: 18 pages, 9 figures, 11 tables

  37. arXiv:2306.05540  [pdf, other

    cs.CL cs.AI

    DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text

    Authors: **yan Su, Terry Yue Zhuo, Di Wang, Preslav Nakov

    Abstract: With the rapid progress of large language models (LLMs) and the huge amount of text they generated, it becomes more and more impractical to manually distinguish whether a text is machine-generated. Given the growing use of LLMs in social media and education, it prompts us to develop methods to detect machine-generated text, preventing malicious usage such as plagiarism, misinformation, and propaga… ▽ More

    Submitted 23 May, 2023; originally announced June 2023.

    Comments: machine-generated text, large language models, LLMs, zero-shot

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  38. arXiv:2306.05535  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data

    Authors: Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov

    Abstract: Develo** tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 May, 2023; originally announced June 2023.

    Comments: Check-Worthiness, Fact-Checking, Fake News, Misinformation, Disinformation, Political Debates, Multimodality

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ICASSP 2024

  39. arXiv:2306.02349  [pdf, other

    cs.CL cs.IR cs.LG

    bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

    Authors: Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

    Abstract: We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequen… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ACL 2023

  40. arXiv:2305.18410  [pdf, other

    cs.LG cs.CL q-bio.GN stat.ME

    Understanding Breast Cancer Survival: Using Causality and Language Models on Multi-omics Data

    Authors: Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng, Preslav Nakov, Kun Zhang

    Abstract: The need for more usable and explainable machine learning models in healthcare increases the importance of develo** and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data. Explainable approaches aid clinicians and biologists in predicting the prognosis of diseases and suggesting proper treatments. However, very little research has been c… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  41. arXiv:2305.14902  [pdf, other

    cs.CL

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries. However, this has also raised concerns about the potential misuse of such texts in journalism, education, and academia. In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse. We first introduce a la… ▽ More

    Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 41 pages

  42. arXiv:2305.14534  [pdf, other

    cs.CL cs.AI

    Detecting Propaganda Techniques in Code-Switched Social Media Text

    Authors: Muhammad Umar Salman, Asif Hanif, Shady Shehata, Preslav Nakov

    Abstract: Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propag… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  43. arXiv:2305.13661  [pdf, other

    cs.CL cs.AI

    On the Risk of Misinformation Pollution with Large Language Models

    Authors: Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang

    Abstract: In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems. We establish a threat model and simulate potential misuse scenarios, both unintentional and intentional, to assess the ex… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 (Findings; Long Paper)

  44. arXiv:2305.13186  [pdf, other

    cs.CL cs.AI

    SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

    Authors: Xinyuan Lu, Liangming Pan, Qian Liu, Preslav Nakov, Min-Yen Kan

    Abstract: Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence. We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims that 1) originate from authentic scientific publications and 2) require compositional reasoning for verification. The claims ar… ▽ More

    Submitted 23 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023 (main conference, long paper)

  45. arXiv:2305.12744  [pdf, other

    cs.CL cs.AI

    Fact-Checking Complex Claims with Program-Guided Reasoning

    Authors: Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov

    Abstract: Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2023 (main conference, long paper)

  46. arXiv:2305.03336  [pdf, other

    cs.CL cs.AI cs.CY

    QCRI at SemEval-2023 Task 3: News Genre, Framing and Persuasion Techniques Detection using Multilingual Models

    Authors: Maram Hasanain, Ahmed Oumar El-Shangiti, Rabindra Nath Nandi, Preslav Nakov, Firoj Alam

    Abstract: Misinformation spreading in mainstream and social media has been misleading users in different ways. Manual detection and verification efforts by journalists and fact-checkers can no longer cope with the great scale and quick spread of misleading information. This motivated research and industry efforts to develop systems for analyzing and verifying news spreading online. The SemEval-2023 Task 3 i… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted at SemEval-23 (ACL-23, propaganda, disinformation, misinformation, fake news

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  47. arXiv:2304.14339  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label Framing Detection with Contrastive Learning

    Authors: Qisheng Liao, Meiting Lai, Preslav Nakov

    Abstract: This paper describes our system for SemEval-2023 Task 3 Subtask 2 on Framing Detection. We used a multi-label contrastive loss for fine-tuning large pre-trained language models in a multi-lingual setting, achieving very competitive results: our system was ranked first on the official test set and on the official shared task leaderboard for five of the six languages for which we had training data a… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: framing, contrastive learning, SemEval-2023 task 3

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: SemEval-2023

  48. arXiv:2304.11130  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Automated Map** of CVE Vulnerability Records to MITRE CWE Weaknesses

    Authors: Ashraf Haddad, Najwa Aaraj, Preslav Nakov, Septimiu Fabian Mare

    Abstract: In recent years, a proliferation of cyber-security threats and diversity has been on the rise culminating in an increase in their reporting and analysis. To counter that, many non-profit organizations have emerged in this domain, such as MITRE and OSWAP, which have been actively tracking vulnerabilities, and publishing defense recommendations in standardized formats. As producing data in such form… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: cybersecurity, MITRE, CVE, CWE

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  49. arXiv:2302.00389  [pdf, other

    cs.AI

    Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

    Authors: Muhammad Arslan Manzoor, Sarah Albarri, Ziting Xian, Zaiqiao Meng, Preslav Nakov, Shangsong Liang

    Abstract: Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA), Natural Language for Visual Reasoning (NLVR), and Vision Language Retrieval (VLR). Among these applications, cross-modal interaction and complementary informati… ▽ More

    Submitted 1 March, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

  50. arXiv:2301.11219  [pdf, other

    cs.CL cs.CY

    Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?

    Authors: Shivam Sharma, Atharva Kulkarni, Tharun Suresh, Himanshi Mathur, Preslav Nakov, Md. Shad Akhtar, Tanmoy Chakraborty

    Abstract: Memes can sway people's opinions over social media as they combine visual and textual information in an easy-to-consume manner. Since memes instantly turn viral, it becomes crucial to infer their intent and potentially associated harmfulness to take timely measures as needed. A common problem associated with meme comprehension lies in detecting the entities referenced and characterizing the role o… ▽ More

    Submitted 10 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted at EACL 2023 (Main Track). 9 Pages (main content), Limitations, Ethical Considerations + 4 Pages (Refs.) + Appendix; 8 Figures; 5 Tables; Paper ID: 804