Skip to main content

Showing 1–15 of 15 results for author: Shelmanov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15627  [pdf, other

    cs.CL cs.LG

    Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph

    Authors: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Lyudmila Rvanova, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov

    Abstract: Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML). The rapid proliferation of large language models (LLMs) has stimulated researchers to seek efficient and effective approaches to UQ in text generation tasks, as in addition to their emerging capabilities, these models have introduced new challenges for bui… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev contributed equally

  2. arXiv:2405.13929  [pdf, other

    cs.CL cs.AI

    Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

    Authors: Aleksandr Nikolich, Konstantin Korolev, Artem Shelmanov, Igor Kiselev

    Abstract: There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and the reduced computational performance due to the disproportionate representation of tokens in model's vocabulary. In this work, we address these issues and introduce Vikhr, a new state-… ▽ More

    Submitted 19 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2404.14183  [pdf, other

    cs.CL

    SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  4. arXiv:2403.04696  [pdf, other

    cs.CL cs.AI cs.LG

    Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

    Authors: Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

    Abstract: Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factually correct, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide an… ▽ More

    Submitted 6 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL-2024 (Findings). Ekaterina Fadeeva, Aleksandr Rubashevskii, and Artem Shelmanov contributed equally

  5. arXiv:2402.11175  [pdf, other

    cs.CL

    M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 29 pages

    Journal ref: ACL 2024 main

  6. arXiv:2311.07383  [pdf, other

    cs.CL cs.LG

    LM-Polygraph: Uncertainty Estimation for Language Models

    Authors: Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov

    Abstract: Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, m… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP-2023

  7. arXiv:2305.14902  [pdf, other

    cs.CL

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries. However, this has also raised concerns about the potential misuse of such texts in journalism, education, and academia. In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse. We first introduce a la… ▽ More

    Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 41 pages

  8. arXiv:2301.03252  [pdf, other

    cs.CL

    Active Learning for Abstractive Text Summarization

    Authors: Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, Alexander Panchenko, Mikhail Burtsev, Artem Shelmanov

    Abstract: Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation requir… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted at EMNLP-2022 Findings

  9. arXiv:2209.13983  [pdf, other

    cs.CV cs.AI

    Medical Image Captioning via Generative Pretrained Transformers

    Authors: Alexander Selivanov, Oleg Y. Rogov, Daniil Chesakov, Artem Shelmanov, Irina Fedulova, Dmitry V. Dylov

    Abstract: The automatic clinical caption generation problem is referred to as proposed model combining the analysis of frontal chest X-Ray scans with structured patient information from the radiology records. We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed combination of these models generates a textual summary wit… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 13 pages, 3 figures, The work was completed in 2021

  10. arXiv:2206.00906  [pdf, other

    cs.CL cs.AI cs.HC cs.NE

    NeuralSympCheck: A Symptom Checking and Disease Diagnostic Neural Model with Logic Regularization

    Authors: Aleksandr Nesterov, Bulat Ibragimov, Dmitriy Umerenkov, Artem Shelmanov, Galina Zubkova, Vladimir Kokh

    Abstract: The symptom checking systems inquire users for their symptoms and perform a rapid and affordable medical assessment of their condition. The basic symptom checking systems based on Bayesian methods, decision trees, or information gain methods are easy to train and do not require significant computational resources. However, their drawbacks are low relevance of proposed symptoms and insufficient qua… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: Published in the proceedings of the conference "Artificial Intelligence in Medicine 2022"

  11. arXiv:2205.03598  [pdf, other

    cs.CL cs.LG

    Towards Computationally Feasible Deep Active Learning

    Authors: Akim Tsvigun, Artem Shelmanov, Gleb Kuzmin, Leonid Sanochkin, Daniil Larionov, Gleb Gusev, Manvel Avetisian, Leonid Zhukov

    Abstract: Active learning (AL) is a prominent technique for reducing the annotation effort required for training machine learning models. Deep learning offers a solution for several essential obstacles to deploying AL in practice but introduces many others. One of such problems is the excessive computational resources required to train an acquisition model and estimate its uncertainty on instances in the un… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: Accepted at NAACL-2022 Findings

  12. arXiv:2202.03101  [pdf, other

    stat.ML cs.LG

    Nonparametric Uncertainty Quantification for Single Deterministic Neural Network

    Authors: Nikita Kotelevskii, Aleksandr Artemenkov, Kirill Fedyanin, Fedor Noskov, Alexander Fishkov, Artem Shelmanov, Artem Vazhentsev, Aleksandr Petiushko, Maxim Panov

    Abstract: This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncerta… ▽ More

    Submitted 27 October, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022 paper

  13. arXiv:2101.08133  [pdf, other

    cs.CL

    Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates

    Authors: Artem Shelmanov, Dmitri Puzyrev, Lyubov Kupriyanova, Denis Belyakov, Daniil Larionov, Nikita Khromov, Olga Kozlova, Ekaterina Artemova, Dmitry V. Dylov, Alexander Panchenko

    Abstract: Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empiri… ▽ More

    Submitted 18 February, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2021)

  14. Neural Entity Linking: A Survey of Models Based on Deep Learning

    Authors: Ozge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, Chris Biemann

    Abstract: This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the "deep learning revolution" in natural language processing. Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks. This work distills a generic architecture of a… ▽ More

    Submitted 7 April, 2022; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: Published in Semantic Web journal

    Journal ref: Semantic Web, Vol. 13, Number 3, 2022

  15. arXiv:2003.06651  [pdf, other

    cs.CL

    Word Sense Disambiguation for 158 Languages using Word Embeddings Only

    Authors: Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

    Abstract: Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely u… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: 10 pages, 5 figures, 4 tables, accepted at LREC 2020