Skip to main content

Showing 1–8 of 8 results for author: Babakov, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17841  [pdf, other

    cs.CL

    Toxicity Classification in Ukrainian

    Authors: Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, Georg Groh

    Abstract: The task of toxicity detection is still a relevant task, especially in the context of safe and fair LMs development. Nevertheless, labeled binary toxicity classification corpora are not available for all languages, which is understandable given the resource-intensive nature of the annotation process. Ukrainian, in particular, is among the languages lacking such resources. To our knowledge, there h… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted to WOAH, NAACL, 2024. arXiv admin note: text overlap with arXiv:2404.02043

  2. arXiv:2404.02037  [pdf, other

    cs.CL cs.AI

    MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register. Recently, text detoxification methods found their applications in various task such as detoxification of Large Language Models (LLMs) (Leong et al., 2023; He et al., 2024; Tang et al., 2023) and toxic speech combating in social networ… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL2024

  3. arXiv:2308.09055  [pdf, other

    cs.CL

    Don't lose the message while paraphrasing: A study on content preserving style transfer

    Authors: Nikolay Babakov, David Dale, Ilya Gusev, Irina Krotova, Alexander Panchenko

    Abstract: Text style transfer techniques are gaining popularity in natural language processing allowing paraphrasing text in the required form: from toxic to neural, from formal to informal, from old to the modern English language, etc. Solving the task is not sufficient to generate some neural/informal/modern text, but it is important to preserve the original content unchanged. This requirement becomes eve… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Published at the NLDB 2023 conference

  4. arXiv:2212.14293  [pdf, other

    cs.CL

    Error syntax aware augmentation of feedback comment generation dataset

    Authors: Nikolay Babakov, Maria Lysyuk, Alexander Shvets, Lilya Kazakova, Alexander Panchenko

    Abstract: This paper presents a solution to the GenChal 2022 shared task dedicated to feedback comment generation for writing learning. In terms of this task given a text with an error and a span of the error, a system generates an explanatory note that helps the writer (language learner) to improve their writing skills. Our solution is based on fine-tuning the T5 model on the initial dataset augmented acco… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: Accepted to publication on INLG 2023

  5. Studying the role of named entities for content preservation in text style transfer

    Authors: Nikolay Babakov, David Dale, Varvara Logacheva, Irina Krotova, Alexander Panchenko

    Abstract: Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer. However, the majority of the existing approaches were tested on such domains as online communications on public platforms, music, or entertainment yet none of them were applied to the domains which are typical for task-ori… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Journal ref: Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham, p.437--448

  6. arXiv:2204.08975  [pdf, ps, other

    cs.CL

    Detecting Text Formality: A Study of Text Classification Approaches

    Authors: Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

    Abstract: Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation -- GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models… ▽ More

    Submitted 8 September, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Published at RANLP2023

  7. arXiv:2203.02392  [pdf, other

    cs.CL

    Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language

    Authors: Nikolay Babakov, Varvara Logacheva, Alexander Panchenko

    Abstract: Toxicity on the Internet, such as hate speech, offenses towards particular users or groups of people, or the use of obscene words, is an acknowledged problem. However, there also exist other types of inappropriate messages which are usually not viewed as toxic, e.g. as they do not contain explicit offences. Such messages can contain covered toxicity or generalizations, incite harmful actions (crim… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2103.05345

  8. arXiv:2103.05345  [pdf, other

    cs.CL

    Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

    Authors: Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: Not all topics are equally "flammable" in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness. While toxicity in user-genera… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: Accepted to the Balto-Slavic NLP workshop 2021 co-located with EACL-2021