Skip to main content

Showing 1–21 of 21 results for author: Hardalov, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15570  [pdf, other

    cs.CL cs.LG

    DEM: Distribution Edited Model for Training with Mixed Data Distributions

    Authors: Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha

    Abstract: Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive trainin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  2. arXiv:2406.13415  [pdf, other

    cs.CL cs.LG

    Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators

    Authors: Matéo Mahaut, Laura Aina, Paula Czarnowska, Momchil Hardalov, Thomas Müller, Lluís Màrquez

    Abstract: Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM's confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: accepted on the main track of ACL 2024

  3. arXiv:2306.05535  [pdf, other

    cs.CL cs.AI cs.IR cs.LG cs.SD eess.AS

    Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data

    Authors: Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov

    Abstract: Develo** tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers. While previous work on this problem has focused exclusively on the text modality, here we explore the utility of the audio modality as an additional input. We create a new multimodal dataset (text and audio in English) containing 48… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 May, 2023; originally announced June 2023.

    Comments: Check-Worthiness, Fact-Checking, Fake News, Misinformation, Disinformation, Political Debates, Multimodality

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ICASSP 2024

  4. arXiv:2306.02349  [pdf, other

    cs.CL cs.IR cs.LG

    bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

    Authors: Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

    Abstract: We present bgGLUE(Bulgarian General Language Understanding Evaluation), a benchmark for evaluating language models on Natural Language Understanding (NLU) tasks in Bulgarian. Our benchmark includes NLU tasks targeting a variety of NLP problems (e.g., natural language inference, fact-checking, named entity recognition, sentiment analysis, question answering, etc.) and machine learning tasks (sequen… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 (Main Conference)

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: ACL 2023

  5. arXiv:2305.17020  [pdf, other

    cs.CL cs.LG

    Diable: Efficient Dialogue State Tracking as Operations on Tables

    Authors: Pietro Lesci, Yoshinari Fu**uma, Momchil Hardalov, Chao Shang, Yassine Benajiba, Lluis Marquez

    Abstract: Sequence-to-sequence state-of-the-art systems for dialogue state tracking (DST) use the full dialogue history as input, represent the current state as a list with all the slots, and generate the entire state from scratch at each dialogue turn. This approach is inefficient, especially when the number of slots is large and the conversation is long. We propose Diable, a new task formalisation that si… ▽ More

    Submitted 1 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 (Findings)

  6. arXiv:2210.04447  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

    Authors: Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry Ilvovsky, Preslav Nakov

    Abstract: While there has been substantial progress in develo** systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to AACL-IJCNLP 2022 (Main Conference)

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: AACL-IJCNLP 2022

  7. arXiv:2201.09012  [pdf, other

    cs.CL cs.AI

    Leaf: Multiple-Choice Question Generation

    Authors: Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

    Abstract: Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g.,… ▽ More

    Submitted 22 January, 2022; originally announced January 2022.

    Comments: Accepted to ECIR 2022 (Demo)

  8. arXiv:2109.15120  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG cs.SI

    SUper Team at SemEval-2016 Task 3: Building a feature-rich system for community question answering

    Authors: Tsvetomila Mihaylova, Pepa Gencheva, Martin Boyanov, Ivana Yovcheva, Todor Mihaylov, Momchil Hardalov, Yasen Kiprov, Daniel Balchev, Ivan Koychev, Preslav Nakov, Ivelina Nikolova, Galia Angelova

    Abstract: We present the system we built for participating in SemEval-2016 Task 3 on Community Question Answering. We achieved the best results on subtask C, and strong results on subtasks A and B, by combining a rich set of various types of features: semantic, lexical, metadata, and user-related. The most important group turned out to be the metadata for the question and for the comment, semantic vectors t… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: community question answering, question-question similarity, question-comment similarity, answer reranking

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: SemEval-2016

  9. arXiv:2109.06050  [pdf, other

    cs.CL cs.LG

    Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

    Authors: Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

    Abstract: The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts are often expressed in many different languages depending on the user and the platform, which can be a local news outlet, a social media platform, a news forum, etc. Most research in stance detection, however, has been limited to working with a single language and… ▽ More

    Submitted 21 December, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted to AAAI 2022 (Preprint version)

  10. arXiv:2108.12898  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.IR cs.LG

    Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

    Authors: Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

    Abstract: In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answe… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: answer generation, question generation, answer-aware question generation, quiz questions, question answering

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: RANLP-2021 (SRW)

  11. arXiv:2104.07467  [pdf, other

    cs.CL cs.LG

    Cross-Domain Label-Adaptive Stance Detection

    Authors: Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

    Abstract: Stance detection concerns the classification of a writer's viewpoint towards a target. There are different task variants, e.g., stance of a tweet vs. a full article, or stance with respect to a claim vs. an (implicit) topic. Moreover, task definitions vary, which includes the label inventory, the data collection, and the annotation protocol. All these aspects hinder cross-domain studies, as they r… ▽ More

    Submitted 13 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021 (Main Conference)

  12. arXiv:2103.17055  [pdf, other

    cs.CL stat.ML

    A Neighbourhood Framework for Resource-Lean Content Flagging

    Authors: Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov

    Abstract: We propose a novel framework for cross-lingual content flagging with limited target-language data, which significantly outperforms prior work in terms of predictive performance. The framework is based on a nearest-neighbour architecture. It is a modern instantiation of the vanilla k-nearest neighbour model, as we use Transformer representations in all its components. Our framework can adapt to new… ▽ More

    Submitted 27 January, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Accepted to appear in Transactions of the Association for Computational Linguistics (TACL) -- this is a pre-MIT Press publication version

  13. arXiv:2103.00242  [pdf, other

    cs.CL cs.SI

    A Survey on Stance Detection for Mis- and Disinformation Identification

    Authors: Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

    Abstract: Understanding attitudes expressed in texts, also known as stance detection, plays an important role in systems for detecting false information online, be it misinformation (unintentionally false) or disinformation (intentionally false information). Stance detection has been framed in different ways, including (a) as a component of fact-checking, rumour detection, and detecting previously fact-chec… ▽ More

    Submitted 8 May, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: Accepted to NAACL-HLT 2022 (Findings)

  14. arXiv:2103.00153  [pdf, other

    cs.CL cs.SI

    Detecting Harmful Content On Online Platforms: What Platforms Need Vs. Where Research Efforts Go

    Authors: Arnav Arora, Preslav Nakov, Momchil Hardalov, Sheikh Muhammad Sarwar, Vibha Nayak, Yoan Dinkov, Dimitrina Zlatkova, Kyle Dent, Ameya Bhatawdekar, Guillaume Bouchard, Isabelle Augenstein

    Abstract: The proliferation of harmful content on online platforms is a major societal problem, which comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other. Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more… ▽ More

    Submitted 6 June, 2023; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: The paper has been accepted for publication to ACM Computing Surveys (CSUR)

  15. arXiv:2011.03080  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering

    Authors: Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, Preslav Nakov

    Abstract: We propose EXAMS -- a new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. We collected more than 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others. EXAMS offers a fine-grained evaluation framework across multiple languages… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020, 17 pages, 6 figures, 8 tables

  16. arXiv:2004.14848  [pdf, other

    cs.CL

    Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Detecting the user's intent and finding the corresponding slots among the utterance's words are important tasks in natural language understanding. Their interconnected nature makes their joint modeling a standard part of training such models. Moreover, data scarceness and specialized vocabularies pose additional challenges. Recently, the advances in pre-trained language models, namely contextualiz… ▽ More

    Submitted 5 October, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

  17. arXiv:1911.08125  [pdf, other

    cs.CL cs.AI cs.IR

    In Search of Credible News

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: We study the problem of finding fake online news. This is an important problem as news of questionable credibility have recently been proliferating in social media at an alarming scale. As this is an understudied problem, especially for languages other than English, we first collect and release to the research community three new balanced credible vs. fake news datasets derived from four online so… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: Credibility, veracity, fact checking, humor detection

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: AIMSA-2016

  18. arXiv:1908.01519  [pdf, other

    cs.CL cs.IR

    Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recently, reading comprehension models achieved near-human performance on large-scale datasets such as SQuAD, CoQA, MS Macro, RACE, etc. This is largely due to the release of pre-trained contextualized representations such as BERT and ELMo, which can be fine-tuned for the target task. Despite those advances and the creation of more challenging datasets, most of the work is still done for English.… ▽ More

    Submitted 6 September, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: Accepted at RANLP 2019 (13 pages, 2 figures, 6 tables)

  19. Recursive Style Breach Detection with Multifaceted Ensemble Learning

    Authors: Daniel Kopev, Dimitrina Zlatkova, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: Accepted as regular paper at AIMSA 2018

  20. Machine Reading Comprehension for Answer Re-Ranking in Customer Support Chatbots

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recent advances in deep neural networks, language modeling and language generation have introduced new ideas to the field of conversational agents. As a result, deep neural models such as sequence-to-sequence, Memory Networks, and the Transformer have become key ingredients of state-of-the-art dialog systems. While those models are able to generate meaningful responses even in unseen situation, th… ▽ More

    Submitted 26 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: 13 pages, 1 figure, 4 tables

    Journal ref: Information 2019, 10, 82

  21. Towards Automated Customer Support

    Authors: Momchil Hardalov, Ivan Koychev, Preslav Nakov

    Abstract: Recent years have seen growing interest in conversational agents, such as chatbots, which are a very good fit for automated customer support because the domain in which they need to operate is narrow. This interest was in part inspired by recent advances in neural machine translation, esp. the rise of sequence-to-sequence (seq2seq) and attention-based models such as the Transformer, which have bee… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

    Comments: Accepted as regular paper at AIMSA 2018