Skip to main content

Showing 1–4 of 4 results for author: Semenov, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2109.08914  [pdf, other

    cs.CL cs.LG

    Text Detoxification using Large Pre-trained Neural Models

    Authors: David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second… ▽ More

    Submitted 3 November, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted to the EMNLP 2021 conference

  2. arXiv:2105.09052  [pdf, other

    cs.CL cs.LG

    Methods for Detoxification of Texts for the Russian Language

    Authors: Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

  3. Determination of weight coefficients for additive fitness function of genetic algorithm

    Authors: V. K. Ivanov, D. S. Dumina, N. A. Semenov

    Abstract: The paper presents a solution for the problem of choosing a method for analytical determining of weight factors for a genetic algorithm additive fitness function. This algorithm is the basis for an evolutionary process, which forms a stable and effective query population in a search engine to obtain highly relevant results. The paper gives a formal description of an algorithm fitness function, whi… ▽ More

    Submitted 27 March, 2021; originally announced March 2021.

    Comments: 9 pages, in Russian

    Journal ref: Software & Systems 2020, vol. 33, no. 1

  4. arXiv:2103.05345  [pdf, other

    cs.CL

    Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation

    Authors: Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: Not all topics are equally "flammable" in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness. While toxicity in user-genera… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: Accepted to the Balto-Slavic NLP workshop 2021 co-located with EACL-2021