Skip to main content

Showing 1–8 of 8 results for author: Gusev, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.09055  [pdf, other

    cs.CL

    Don't lose the message while paraphrasing: A study on content preserving style transfer

    Authors: Nikolay Babakov, David Dale, Ilya Gusev, Irina Krotova, Alexander Panchenko

    Abstract: Text style transfer techniques are gaining popularity in natural language processing allowing paraphrasing text in the required form: from toxic to neural, from formal to informal, from old to the modern English language, etc. Solving the task is not sufficient to generate some neural/informal/modern text, but it is important to preserve the original content unchanged. This requirement becomes eve… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Published at the NLDB 2023 conference

  2. arXiv:2204.13638  [pdf, other

    cs.CL cs.LG

    Russian Texts Detoxification with Levenshtein Editing

    Authors: Ilya Gusev

    Abstract: Text detoxification is a style transfer task of creating neutral versions of toxic texts. In this paper, we use the concept of text editing to build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. With this model, we achieved the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.

    Submitted 8 June, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Accepted to Dialogue 2022

  3. arXiv:2108.12626  [pdf, other

    cs.CL cs.LG

    HeadlineCause: A Dataset of News Headlines for Detecting Causalities

    Authors: Ilya Gusev, Alexey Tikhonov

    Abstract: Detecting implicit causal relations in texts is a task that requires both common sense and world knowledge. Existing datasets are focused either on commonsense causal reasoning or explicit causal relations. In this work, we present HeadlineCause, a dataset for detecting implicit causal relations between pairs of news headlines. The dataset includes over 5000 headline pairs from English news and ov… ▽ More

    Submitted 28 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

  4. arXiv:2105.00981  [pdf, other

    cs.CL

    Russian News Clustering and Headline Selection Shared Task

    Authors: Ilya Gusev, Ivan Smurov

    Abstract: This paper presents the results of the Russian News Clustering and Headline Selection shared task. As a part of it, we propose the tasks of Russian news event detection, headline selection, and headline generation. These tasks are accompanied by datasets and baselines. The presented datasets for event detection and headline selection are the first public Russian datasets for their tasks. The headl… ▽ More

    Submitted 14 June, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: Accepted to Dialogue 2021 conference

  5. arXiv:2007.05044  [pdf, ps, other

    cs.CL

    Advances of Transformer-Based Models for News Headline Generation

    Authors: Alexey Bukhtiyarov, Ilya Gusev

    Abstract: Pretrained language models based on Transformer architecture are the reason for recent breakthroughs in many areas of NLP, including sentiment analysis, question answering, named entity recognition. Headline generation is a special kind of text summarization task. Models need to have strong natural language understanding that goes beyond the meaning of individual words and sentences and an ability… ▽ More

    Submitted 27 July, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Version 2; Accepted to AINL 2020

  6. Dataset for Automatic Summarization of Russian News

    Authors: Ilya Gusev

    Abstract: Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of te… ▽ More

    Submitted 5 October, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Version 4, October 2021, corrected BLEU scores

    Journal ref: In: AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham (2020)

  7. arXiv:1904.11475  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Importance of Copying Mechanism for News Headline Generation

    Authors: Ilya Gusev

    Abstract: News headline generation is an essential problem of text summarization because it is constrained, well-defined, and is still hard to solve. Models with a limited vocabulary can not solve it well, as new named entities can appear regularly in the news and these entities often should be in the headline. News articles in morphologically rich languages such as Russian require model modifications due t… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Journal ref: Computational Linguistics and Intellectual Technologies, Papers from the Annual International Conference "Dialogue" (2019) Issue 18, 229-236

  8. arXiv:1807.00818  [pdf

    cs.CL cs.AI cs.LG stat.ML

    Improving part-of-speech tagging via multi-task learning and character-level word representations

    Authors: Daniil Anastasyev, Ilya Gusev, Eugene Indenbom

    Abstract: In this paper, we explore the ways to improve POS-tagging using various types of auxiliary losses and different word representations. As a baseline, we utilized a BiLSTM tagger, which is able to achieve state-of-the-art results on the sequence labelling tasks. We developed a new method for character-level word representation using feedforward neural network. Such representation gave us better resu… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Journal ref: Computational Linguistics and Intellectual Technologies, Papers from the Annual International Conference "Dialogue" (2018) Issue 17, 14-27