-
SumHiS: Extractive Summarization Exploiting Hidden Structure
Authors:
Tikhonov Pavel,
Anastasiya Ianina,
Valentin Malykh
Abstract:
Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2…
▽ More
Extractive summarization is a task of highlighting the most important parts of the text. We introduce a new approach to extractive summarization task using hidden clustering structure of the text. Experimental results on CNN/DailyMail demonstrate that our approach generates more accurate summaries than both extractive and abstractive methods, achieving state-of-the-art results in terms of ROUGE-2 metric exceeding the previous approaches by 10%. Additionally, we show that hidden structure of the text could be interpreted as aspects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Answer Candidate Type Selection: Text-to-Text Language Model for Closed Book Question Answering Meets Knowledge Graphs
Authors:
Mikhail Salnikov,
Maria Lysyuk,
Pavel Braslavski,
Anton Razzhigaev,
Valentin Malykh,
Alexander Panchenko
Abstract:
Pre-trained Text-to-Text Language Models (LMs), such as T5 or BART yield promising results in the Knowledge Graph Question Answering (KGQA) task. However, the capacity of the models is limited and the quality decreases for questions with less popular entities. In this paper, we present a novel approach which works on top of the pre-trained Text-to-Text QA system to address this issue. Our simple y…
▽ More
Pre-trained Text-to-Text Language Models (LMs), such as T5 or BART yield promising results in the Knowledge Graph Question Answering (KGQA) task. However, the capacity of the models is limited and the quality decreases for questions with less popular entities. In this paper, we present a novel approach which works on top of the pre-trained Text-to-Text QA system to address this issue. Our simple yet effective method performs filtering and re-ranking of generated candidates based on their types derived from Wikidata "instance_of" property.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Large Language Models Meet Knowledge Graphs to Answer Factoid Questions
Authors:
Mikhail Salnikov,
Hai Le,
Prateek Rajput,
Irina Nikishina,
Pavel Braslavski,
Valentin Malykh,
Alexander Panchenko
Abstract:
Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgra…
▽ More
Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgraphs extraction from a Knowledge Graph based on question entities and answer candidates. Then, we procure easily interpreted information with Transformer-based models through the linearization of the extracted subgraphs. Final re-ranking of the answer candidates with the extracted information boosts Hits@1 scores of the pre-trained text-to-text language models by 4-6%.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
Authors:
Nikita Sorokin,
Dmitry Abulkhanov,
Sergey Nikolenko,
Valentin Malykh
Abstract:
We consider the clone detection and information retrieval problems for source code, well-known tasks important for any programming language. Although it is also an important and interesting problem to find code snippets that operate identically but are written in different programming languages, to the best of our knowledge multilingual clone detection has not been studied in literature. In this w…
▽ More
We consider the clone detection and information retrieval problems for source code, well-known tasks important for any programming language. Although it is also an important and interesting problem to find code snippets that operate identically but are written in different programming languages, to the best of our knowledge multilingual clone detection has not been studied in literature. In this work, we formulate the multilingual clone detection problem and present XCD, a new benchmark dataset produced from the CodeForces submissions dataset. Moreover, we present a novel training procedure, called cross-consistency training (CCT), that we apply to train language models on source code in different programming languages. The resulting CCT-LM model, initialized with GraphCodeBERT and fine-tuned with CCT, achieves new state of the art, outperforming existing approaches on the POJ-104 clone detection benchmark with 95.67\% MAP and AdvTest code search benchmark with 47.18\% MRR; it also shows the best results on the newly created multilingual clone detection benchmark XCD across all programming languages.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets
Authors:
Ivan Sedykh,
Dmitry Abulkhanov,
Nikita Sorokin,
Sergey Nikolenko,
Valentin Malykh
Abstract:
Code search is an important and well-studied task, but it usually means searching for code by a text query. We argue that using a code snippet (and possibly an error traceback) as a query while looking for bugfixing instructions and code samples is a natural use case not covered by prior art. Moreover, existing datasets use code comments rather than full-text descriptions as text, making them unsu…
▽ More
Code search is an important and well-studied task, but it usually means searching for code by a text query. We argue that using a code snippet (and possibly an error traceback) as a query while looking for bugfixing instructions and code samples is a natural use case not covered by prior art. Moreover, existing datasets use code comments rather than full-text descriptions as text, making them unsuitable for this use case. We present a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; we show that on SearchBySnippet, existing architectures fall short of a simple BM25 baseline even after fine-tuning. We present a new single encoder model SnippeR that outperforms several strong baselines on SearchBySnippet with a result of 0.451 Recall@10; we propose the SearchBySnippet dataset and SnippeR as a new important benchmark for code search evaluation.
△ Less
Submitted 27 May, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
DetIE: Multilingual Open Information Extraction Inspired by Object Detection
Authors:
Michael Vasilkovsky,
Anton Alekseev,
Valentin Malykh,
Ilya Shenbin,
Elena Tutubalina,
Dmitriy Salikhov,
Mikhail Stepnov,
Andrey Chertok,
Sergey Nikolenko
Abstract:
State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorith…
▽ More
State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Template-based Approach to Zero-shot Intent Recognition
Authors:
Dmitry Lamanov,
Pavel Burnyshev,
Ekaterina Artemova,
Valentin Malykh,
Andrey Bout,
Irina Piontkovskaya
Abstract:
The recent advances in transfer learning techniques and pre-training of large contextualized encoders foster innovation in real-life applications, including dialog assistants. Practical needs of intent recognition require effective data usage and the ability to constantly update supported intents, adopting new ones, and abandoning outdated ones. In particular, the generalized zero-shot paradigm, i…
▽ More
The recent advances in transfer learning techniques and pre-training of large contextualized encoders foster innovation in real-life applications, including dialog assistants. Practical needs of intent recognition require effective data usage and the ability to constantly update supported intents, adopting new ones, and abandoning outdated ones. In particular, the generalized zero-shot paradigm, in which the model is trained on the seen intents and tested on both seen and unseen intents, is taking on new importance. In this paper, we explore the generalized zero-shot setup for intent recognition. Following best practices for zero-shot text classification, we treat the task with a sentence pair modeling approach. We outperform previous state-of-the-art f1-measure by up to 16\% for unseen intents, using intent labels and user utterances and without accessing external sources (such as knowledge bases). Further enhancement includes lexicalization of intent labels, which improves performance by up to 7\%. By using task transferring from other sentence pair tasks, such as Natural Language Inference, we gain additional improvements.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Mass-ratio condition for non-binding of three two-component particles with contact interactions
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
Binding of two heavy fermions interacting with a light particle via the contact interaction is possible only for sufficiently large heavy-light mass ratio. In this work, the two-variable inequality is derived to determine a specific value $ μ^* $ providing that there are no three-body bound states for the mass ratio smaller than $ μ^* $. The value $ μ^* = 5.26 $ is obtained by analyzing this inequ…
▽ More
Binding of two heavy fermions interacting with a light particle via the contact interaction is possible only for sufficiently large heavy-light mass ratio. In this work, the two-variable inequality is derived to determine a specific value $ μ^* $ providing that there are no three-body bound states for the mass ratio smaller than $ μ^* $. The value $ μ^* = 5.26 $ is obtained by analyzing this inequality for a total angular momentum and parity $ L^P = 1^- $. For other $ L^P $ sectors, the specific mass-ratio values providing an absence of the three-body bound states are found in a similar way. For generality, the method is extended to determine corresponding mass-ratio values for the system consisting of two identical bosons and a distinct particle for different $ L^P $ ($ L > 0 $) sectors.
△ Less
Submitted 23 June, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Minlos-Faddeev regularization of zero-range interactions in the three-body problem
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
To regularize the three-body problem, Minlos and Faddeev suggested a modification of zero-range model, which diminishes interaction at the triple-collision point. The analysis reveals that this regularization results in four alternatives depending on the regularization parameter $ σ$. Explicitly, Efimov or Thomas effects remain for $ σ< σ_c $, the additional boundary conditions of two types should…
▽ More
To regularize the three-body problem, Minlos and Faddeev suggested a modification of zero-range model, which diminishes interaction at the triple-collision point. The analysis reveals that this regularization results in four alternatives depending on the regularization parameter $ σ$. Explicitly, Efimov or Thomas effects remain for $ σ< σ_c $, the additional boundary conditions of two types should be imposed at the triple-collision point for $ σ_c \le σ< σ_e $ and $ σ_e < σ< σ_r $, and the problem is regularized for $ σ\ge σ_r $. Critical values $ σ_c < σ_e < σ_r $ separating different alternatives are determined both for a two-component three-body system and for three identical bosons.
△ Less
Submitted 16 June, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
WikiMulti: a Corpus for Cross-Lingual Summarization
Authors:
Pavel Tikhonov,
Valentin Malykh
Abstract:
Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our datas…
▽ More
Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. We introduce WikiMulti - a new dataset for cross-lingual summarization based on Wikipedia articles in 15 languages. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We make our dataset publicly available here: https://github.com/tikhonovpavel/wikimulti
△ Less
Submitted 23 April, 2022;
originally announced April 2022.
-
Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models
Authors:
Alena Fenogenova,
Maria Tikhonova,
Vladislav Mikhailov,
Tatiana Shavrina,
Anton Emelyanov,
Denis Shevelev,
Alexandr Kukushkin,
Valentin Malykh,
Ekaterina Artemova
Abstract:
In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks.
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological impro…
▽ More
In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks.
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on \texttt{jiant} framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at https://russiansuperglue.com/.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
A Single Example Can Improve Zero-Shot Data Generation
Authors:
Pavel Burnyshev,
Valentin Malykh,
Andrey Bout,
Ekaterina Artemova,
Irina Piontkovskaya
Abstract:
Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utter…
▽ More
Sub-tasks of intent classification, such as robustness to distribution shift, adaptation to specific user groups and personalization, out-of-domain detection, require extensive and flexible datasets for experiments and evaluation. As collecting such datasets is time- and labor-consuming, we propose to use text generation methods to gather datasets. The generator should be trained to generate utterances that belong to the given intent. We explore two approaches to generating task-oriented utterances. In the zero-shot approach, the model is trained to generate utterances from seen intents and is further used to generate utterances for intents unseen during training. In the one-shot approach, the model is presented with a single utterance from a test intent. We perform a thorough automatic, and human evaluation of the dataset generated utilizing two proposed approaches. Our results reveal that the attributes of the generated data are close to original test sets, collected via crowd-sourcing.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
MOROCCO: Model Resource Comparison Framework
Authors:
Valentin Malykh,
Alexander Kukushkin,
Ekaterina Artemova,
Vladislav Mikhailov,
Maria Tikhonova,
Tatiana Shavrina
Abstract:
The new generation of pre-trained NLP models push the SOTA to the new limits, but at the cost of computational resources, to the point that their use in real production environments is often prohibitively expensive. We tackle this problem by evaluating not only the standard quality metrics on downstream tasks but also the memory footprint and inference time. We present MOROCCO, a framework to comp…
▽ More
The new generation of pre-trained NLP models push the SOTA to the new limits, but at the cost of computational resources, to the point that their use in real production environments is often prohibitively expensive. We tackle this problem by evaluating not only the standard quality metrics on downstream tasks but also the memory footprint and inference time. We present MOROCCO, a framework to compare language models compatible with \texttt{jiant} environment which supports over 50 NLU tasks, including SuperGLUE benchmark and multiple probing suites. We demonstrate its applicability for two GLUE-like suites in different languages.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Authors:
Tatiana Shavrina,
Alena Fenogenova,
Anton Emelyanov,
Denis Shevelev,
Ekaterina Artemova,
Valentin Malykh,
Vladislav Mikhailov,
Maria Tikhonova,
Andrey Chertok,
Andrey Evlampiev
Abstract:
In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logi…
▽ More
In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (https://github.com/RussianNLP/RussianSuperGLUE), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the adapted diagnostic test set and offer the first steps to further expanding or assessing state-of-the-art models independently of language.
△ Less
Submitted 2 November, 2020; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
Authors:
Anton Alekseev,
Elena Tutubalina,
Valentin Malykh,
Sergey Nikolenko
Abstract:
Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsg…
▽ More
Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews
Authors:
Elena Tutubalina,
Ilseyar Alimova,
Zulfat Miftahutdinov,
Andrey Sakhovskiy,
Valentin Malykh,
Sergey Nikolenko
Abstract:
The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from…
▽ More
The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labelled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications, and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multi-label sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
RecVAE: a New Variational Autoencoder for Top-N Recommendations with Implicit Feedback
Authors:
Ilya Shenbin,
Anton Alekseev,
Elena Tutubalina,
Valentin Malykh,
Sergey I. Nikolenko
Abstract:
Recent research has shown the advantages of using autoencoders based on deep neural networks for collaborative filtering. In particular, the recently proposed Mult-VAE model, which used the multinomial likelihood variational autoencoders, has shown excellent results for top-N recommendations. In this work, we propose the Recommender VAE (RecVAE) model that originates from our research on regulariz…
▽ More
Recent research has shown the advantages of using autoencoders based on deep neural networks for collaborative filtering. In particular, the recently proposed Mult-VAE model, which used the multinomial likelihood variational autoencoders, has shown excellent results for top-N recommendations. In this work, we propose the Recommender VAE (RecVAE) model that originates from our research on regularization techniques for variational autoencoders. RecVAE introduces several novel ideas to improve Mult-VAE, including a novel composite prior distribution for the latent codes, a new approach to setting the $β$ hyperparameter for the $β$-VAE framework, and a new approach to training based on alternating updates. In experimental evaluation, we show that RecVAE significantly outperforms previously proposed autoencoder-based models, including Mult-VAE and RaCT, across classical collaborative filtering datasets, and present a detailed ablation study to assess our new developments. Code and models are available at https://github.com/ilya-shenbin/RecVAE.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Three two-component fermions with contact interactions: correct formulation and energy spectrum
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
Properties of two identical particles of mass $m$ and a distinct particle of mass $m_1$ in the universal low-energy limit of zero-range two-body interaction are studied in different sectors of total angular momentum $L$ and parity $P$. For the unambiguous formulation of the problem in the interval $μ_r(L^P) < m/m_1 \le μ_c(L^P)$ ($μ_r(1^-) \approx 8.619$ and $μ_c(1^-) \approx 13.607$,…
▽ More
Properties of two identical particles of mass $m$ and a distinct particle of mass $m_1$ in the universal low-energy limit of zero-range two-body interaction are studied in different sectors of total angular momentum $L$ and parity $P$. For the unambiguous formulation of the problem in the interval $μ_r(L^P) < m/m_1 \le μ_c(L^P)$ ($μ_r(1^-) \approx 8.619$ and $μ_c(1^-) \approx 13.607$, $μ_r(2^+) \approx 32.948$ and $μ_c(2^+) \approx 38.630$,~etc.) in each $L^P$ sector an additional parameter $b$ determining the wave function near the triple-collision point is introduced; thus, a one-parameter family of self-adjoint Hamiltonians is defined. Within the framework of this formulation, dependence of the bound-state energies on $m/m_1$ and $b$ in the sector of angular momentum and parity $L^P$ is calculated for $L \le 5$ and analysed with the aid of a simple model. A number of the bound states for each $L^P$ sector is analysed and presented in the form of `phase diagrams' in the plane of two parameters $m/m_1$ and $b$.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
The Second Conversational Intelligence Challenge (ConvAI2)
Authors:
Emily Dinan,
Varvara Logacheva,
Valentin Malykh,
Alexander Miller,
Kurt Shuster,
Jack Urbanek,
Douwe Kiela,
Arthur Szlam,
Iulian Serban,
Ryan Lowe,
Shrimai Prabhumoye,
Alan W Black,
Alexander Rudnicky,
Jason Williams,
Joelle Pineau,
Mikhail Burtsev,
Jason Weston
Abstract:
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics lik…
▽ More
We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics like perplexity to measure the performance across sequences of utterances (conversations) -- in terms of repetition, consistency and balance of dialogue acts (e.g. how many questions asked vs. answered).
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
AspeRa: Aspect-based Rating Prediction Model
Authors:
Sergey I. Nikolenko,
Elena Tutubalina,
Valentin Malykh,
Ilya Shenbin,
Anton Alekseev
Abstract:
We propose a novel end-to-end Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items and at the same time discovers coherent aspects of reviews that can be used to explain predictions or profile users. The AspeRa model uses max-margin losses for joint item and user embedding learning and a dual-headed architecture; it significantly outperforms…
▽ More
We propose a novel end-to-end Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items and at the same time discovers coherent aspects of reviews that can be used to explain predictions or profile users. The AspeRa model uses max-margin losses for joint item and user embedding learning and a dual-headed architecture; it significantly outperforms recently proposed state-of-the-art models such as DeepCoNN, HFT, NARRE, and TransRev on two real world data sets of user reviews. With qualitative examination of the aspects and quantitative evaluation of rating prediction models based on these aspects, we show how aspect embeddings can be used in a recommender system.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Self-Attentive Model for Headline Generation
Authors:
Daniil Gavrilov,
Pavel Kalaidin,
Valentin Malykh
Abstract:
Headline generation is a special type of text summarization task. While the amount of available training data for this task is almost unlimited, it still remains challenging, as learning to generate headlines for news articles implies that the model has strong reasoning about natural language. To overcome this issue, we applied recent Universal Transformer architecture paired with byte-pair encodi…
▽ More
Headline generation is a special type of text summarization task. While the amount of available training data for this task is almost unlimited, it still remains challenging, as learning to generate headlines for news articles implies that the model has strong reasoning about natural language. To overcome this issue, we applied recent Universal Transformer architecture paired with byte-pair encoding technique and achieved new state-of-the-art results on the New York Times Annotated corpus with ROUGE-L F1-score 24.84 and ROUGE-2 F1-score 13.48. We also present the new RIA corpus and reach ROUGE-L F1-score 36.81 and ROUGE-2 F1-score 22.15 on it.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Sequence Learning with RNNs for Medical Concept Normalization in User-Generated Texts
Authors:
Elena Tutubalina,
Zulfat Miftahutdinov,
Sergey Nikolenko,
Valentin Malykh
Abstract:
In this work, we consider the medical concept normalization problem, i.e., the problem of map** a disease mention in free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This task is challenging since medical terminology is very different when coming from health care professionals or from the general public in th…
▽ More
In this work, we consider the medical concept normalization problem, i.e., the problem of map** a disease mention in free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This task is challenging since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem, with recurrent neural networks trained to obtain semantic representations of one- and multi-word expressions. We develop end-to-end neural architectures tailored specifically to medical concept normalization, including bidirectional LSTM and GRU with an attention mechanism and additional semantic similarity features based on UMLS. Our evaluation over a standard benchmark shows that our model improves over a state of the art baseline for classification based on CNNs.
△ Less
Submitted 29 November, 2018; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Universal description of three two-component fermions
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
A quantum mechanical three-body problem for two identical fermions of mass $m$ and a distinct particle of mass $m_1$ in the universal limit of zero-range two-body interaction is studied. For the unambiguous formulation of the problem in the interval $μ_r < m/m_1 \le μ_c$ ($μ_r \approx 8.619$ and $μ_c \approx 13.607$) an additional parameter $b$ determining the wave function near the triple-collisi…
▽ More
A quantum mechanical three-body problem for two identical fermions of mass $m$ and a distinct particle of mass $m_1$ in the universal limit of zero-range two-body interaction is studied. For the unambiguous formulation of the problem in the interval $μ_r < m/m_1 \le μ_c$ ($μ_r \approx 8.619$ and $μ_c \approx 13.607$) an additional parameter $b$ determining the wave function near the triple-collision point is introduced; thus, a one-parameter family of self-adjoint Hamiltonians is defined. The dependence of the bound-state energies on $m/m_1$ and $b$ in the sector of angular momentum and parity $L^P = 1^-$ is calculated and analysed with the aid of a simple model.
△ Less
Submitted 25 January, 2016; v1 submitted 18 December, 2015;
originally announced December 2015.
-
Recent advances in description of few two-component fermions
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
Overview of the recent advances in description of the few two-component fermions is presented. The model of zero-range interaction is generally considered to discuss the principal aspects of the few-body dynamics. Particular attention is paid to detailed description of two identical fermions of mass $m$ and a distinct particle of mass $m_1$: it turns out that two $L^P = 1^-$ three-body bound state…
▽ More
Overview of the recent advances in description of the few two-component fermions is presented. The model of zero-range interaction is generally considered to discuss the principal aspects of the few-body dynamics. Particular attention is paid to detailed description of two identical fermions of mass $m$ and a distinct particle of mass $m_1$: it turns out that two $L^P = 1^-$ three-body bound states emerge if mass ratio $m/m_1$ increases up to the critical value $μ_c \approx 13.607$, above which the Efimov effect takes place. The topics considered include rigorous treatment of the few-fermion problem in the zero-range interaction limit, low-dimensional results, the four-body energy spectrum, crossover of the energy spectra for $m/m_1$ near $μ_c $, and properties of potential-dependent states. At last, enlisted are the problems, whose solution is in due course.
△ Less
Submitted 25 January, 2013; v1 submitted 23 November, 2012;
originally announced November 2012.
-
Consistent alpha-cluster description of the 12C (0^+_2) resonance
Authors:
S. I. Fedotov,
O. I. Kartavtsev,
A. V. Malykh
Abstract:
The near-threshold 12C (0^+_2) resonance provides unique possibility for fast helium burning in stars, as predicted by Hoyle to explain the observed abundance of elements in the Universe. Properties of this resonance are calculated within the framework of the alpha-cluster model whose two-body and three-body effective potentials are tuned to describe the alpha - alpha scattering data, the energies…
▽ More
The near-threshold 12C (0^+_2) resonance provides unique possibility for fast helium burning in stars, as predicted by Hoyle to explain the observed abundance of elements in the Universe. Properties of this resonance are calculated within the framework of the alpha-cluster model whose two-body and three-body effective potentials are tuned to describe the alpha - alpha scattering data, the energies of the 0^+_1 and 0^+_2 states, and the 0^+_1-state root-mean-square radius. The extremely small width of the 0^+_2 state, the 0_2^+ to 0_1^+ monopole transition matrix element, and transition radius are found in remarkable agreement with the experimental data. The 0^+_2-state structure is described as a system of three alpha-particles oscillating between the ground-state-like configuration and the elongated chain configuration whose probability exceeds 0.9.
△ Less
Submitted 9 September, 2010;
originally announced September 2010.
-
Bound states and scattering lengths of three two-component particles with zero-range interactions under one-dimensional confinement
Authors:
O. I. Kartavtsev,
A. V. Malykh,
S. A. Sofianos
Abstract:
The universal three-body dynamics in ultra-cold binary gases confined to one-dimensional motion are studied. The three-body binding energies and the (2 + 1)-scattering lengths are calculated for two identical particles of mass $m$ and a different one of mass $m_1$, which interactions is described in the low-energy limit by zero-range potentials. The critical values of the mass ratio $m/m_1$, at…
▽ More
The universal three-body dynamics in ultra-cold binary gases confined to one-dimensional motion are studied. The three-body binding energies and the (2 + 1)-scattering lengths are calculated for two identical particles of mass $m$ and a different one of mass $m_1$, which interactions is described in the low-energy limit by zero-range potentials. The critical values of the mass ratio $m/m_1$, at which the three-body states arise and the (2 + 1)-scattering length equals zero, are determined both for zero and infinite interaction strength $λ_1$ of the identical particles. A number of exact results are enlisted and asymptotic dependences both for $m/m_1 \to \infty$ and $λ_1 \to -\infty$ are derived. Combining the numerical and analytical results, a schematic diagram showing the number of the three-body bound states and the sign of the (2 + 1)-scattering length in the plane of the mass ratio and interaction-strength ratio is deduced. The results provide a description of the homogeneous and mixed phases of atoms and molecules in dilute binary quantum gases.
△ Less
Submitted 24 October, 2008; v1 submitted 20 August, 2008;
originally announced August 2008.
-
Universal description of the rotational-vibrational spectrum of three particles with zero-range interactions
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
A comprehensive universal description of the rotational-vibrational spectrum for two identical particles of mass $m$ and the third particle of the mass $m_1$ in the zero-range limit of the interaction between different particles is given for arbitrary values of the mass ratio $m/m_1$ and the total angular momentum $L$. If the two-body scattering length is positive, a number of vibrational states…
▽ More
A comprehensive universal description of the rotational-vibrational spectrum for two identical particles of mass $m$ and the third particle of the mass $m_1$ in the zero-range limit of the interaction between different particles is given for arbitrary values of the mass ratio $m/m_1$ and the total angular momentum $L$. If the two-body scattering length is positive, a number of vibrational states is finite for $L_c(m/m_1) \le L \le L_b(m/m_1)$, zero for $L>L_b(m/m_1)$, and infinite for $L<L_c(m/m_1)$. If the two-body scattering length is negative, a number of states is either zero for $L \ge L_c(m/m_1)$ or infinite for $L<L_c(m/m_1)$. For a finite number of vibrational states, all the binding energies are described by the universal function $ε_{LN}(m/m_1) = {\cal E}(ξ, η)$, where $ξ=\displaystyle\frac{N-1/2}{\sqrt{L(L + 1)}}$, $η=\displaystyle\sqrt{\frac{m}{m_1 L (L + 1)}}$,and $N$ is the vibrational quantum number. This scaling dependence is in agreement with the numerical calculations for $L > 2$ and only slightly deviates from those for $L = 1, 2$. The universal description implies that the critical values $L_c(m/m_1)$ and $L_b(m/m_1)$ increase as $0.401 \sqrt{m/m_1}$ and $0.563 \sqrt{m/m_1}$, respectively, while a number of vibrational states for $L \ge L_c(m/m_1)$ is within the range $N \le N_{max} \approx 1.1 \sqrt{L(L+1)}+1/2$.
△ Less
Submitted 19 October, 2007; v1 submitted 26 September, 2007;
originally announced September 2007.
-
Low-energy three-body dynamics in binary quantum gases
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
The universal three-body dynamics in ultra-cold binary Fermi and Fermi-Bose mixtures is studied. Two identical fermions of the mass $m$ and a particle of the mass $m_1$ with the zero-range two-body interaction in the states of the total angular momentum L=1 are considered. Using the boundary condition model for the s-wave interaction of different particles, both eigenvalue and scattering problem…
▽ More
The universal three-body dynamics in ultra-cold binary Fermi and Fermi-Bose mixtures is studied. Two identical fermions of the mass $m$ and a particle of the mass $m_1$ with the zero-range two-body interaction in the states of the total angular momentum L=1 are considered. Using the boundary condition model for the s-wave interaction of different particles, both eigenvalue and scattering problems are treated by solving hyper-radial equations, whose terms are derived analytically. The dependencies of the three-body binding energies on the mass ratio $m/m_1$ for the positive two-body scattering length are calculated; it is shown that the ground and excited states arise at $m/m_1 \ge λ_1 \approx 8.17260$ and $m/m_1 \ge λ_2 \approx 12.91743$, respectively. For $m/m_1 \alt λ_1$ and $m/m_1 \alt λ_2$, the relevant bound states turn to narrow resonances, whose positions and widths are calculated. The 2 + 1 elastic scattering and the three-body recombination near the three-body threshold are studied and it is shown that a two-hump structure in the mass-ratio dependencies of the cross sections is connected with arising of the bound states.
△ Less
Submitted 28 October, 2006;
originally announced October 2006.
-
Universal low-energy properties of three two-dimensional particles
Authors:
O. I. Kartavtsev,
A. V. Malykh
Abstract:
Universal low-energy properties are studied for three identical bosons confined in two dimensions. The short-range pair-wise interaction in the low-energy limit is described by means of the boundary condition model. The wave function is expanded in a set of eigenfunctions on the hypersphere and the system of hyper-radial equations is used to obtain analytical and numerical results. Within the fr…
▽ More
Universal low-energy properties are studied for three identical bosons confined in two dimensions. The short-range pair-wise interaction in the low-energy limit is described by means of the boundary condition model. The wave function is expanded in a set of eigenfunctions on the hypersphere and the system of hyper-radial equations is used to obtain analytical and numerical results. Within the framework of this method, exact analytical expressions are derived for the eigenpotentials and the coupling terms of hyper-radial equations. The derivation of the coupling terms is generally applicable to a variety of three-body problems provided the interaction is described by the boundary condition model. The asymptotic form of the total wave function at a small and a large hyper-radius $ρ$ is studied and the universal logarithmic dependence $\sim \ln^3 ρ$ in the vicinity of the triple-collision point is derived. Precise three-body binding energies and the $2 + 1$ scattering length are calculated.
△ Less
Submitted 1 June, 2006;
originally announced June 2006.
-
Effective three-body interactions in the alpha-cluster model for the ^{12}C nucleus
Authors:
S. I. Fedotov,
O. I. Kartavtsev,
A. V. Malykh
Abstract:
Properties of the lowest $0^{+}$ states of $^{12}\mathrm{C}$ are calculated to study the role of three-body interactions in the $α$-cluster model. An additional short-range part of the local three-body potential is introduced to incorporate the effects beyond the $α$-cluster model. There is enough freedom in this potential to reproduce the experimental values of the ground-state and excited-stat…
▽ More
Properties of the lowest $0^{+}$ states of $^{12}\mathrm{C}$ are calculated to study the role of three-body interactions in the $α$-cluster model. An additional short-range part of the local three-body potential is introduced to incorporate the effects beyond the $α$-cluster model. There is enough freedom in this potential to reproduce the experimental values of the ground-state and excited-state energies and the ground-state root-mean-square radius. The calculations reveal two principal choices of the two-body and three-body potentials. Firstly, one can adjust the potentials to obtain the width of the excited $0_2^+$ state and the monopole $0_2^+ \to 0_1^+ $ transition matrix element in good agreement with the experimental data. In this case, the three-body potential has strong short-range attraction supporting a narrow resonance above the $0_2^+$ state, the excited-state wave function contains a significant short-range component, and the excited-state root-mean-square radius is comparable to that of the ground state. Next, rejecting the solutions with an additional narrow resonance, one finds that the excited-state width and the monopole transition matrix element are insensitive to the choice of the potentials and both values exceed the experimental ones.
△ Less
Submitted 9 September, 2005;
originally announced September 2005.
-
Three-alpha-cluster structure of the 0^+ states in ^{12}C and the effective alpha-alpha interactions
Authors:
S. I. Fedotov,
O. I. Kartavtsev,
V. I. Kochkin,
A. V. Malykh
Abstract:
The $0^{+}$ states of $^{12}\mathrm{C}$ are considered within the framework of the microscopic three-$α$-cluster model. The main attention is paid to accurate calculation of the width of the extremely narrow near-threshold $0^+_2$ state which plays a key role in stellar nucleosynthesis. It is shown that the $0^{+}_2$-state decays by means of the sequential mechanism…
▽ More
The $0^{+}$ states of $^{12}\mathrm{C}$ are considered within the framework of the microscopic three-$α$-cluster model. The main attention is paid to accurate calculation of the width of the extremely narrow near-threshold $0^+_2$ state which plays a key role in stellar nucleosynthesis. It is shown that the $0^{+}_2$-state decays by means of the sequential mechanism ${^{12}\mathrm{C}} \to α+{^8\mathrm{Be}} \to 3α$. Calculations are performed for a number of effective $α- α$ potentials which are chosen to reproduce both energy and width of $^8\mathrm{Be}$. The parameters of the additional three-body potential are chosen to fix both the ground and excited state energies at the experimental values. The dependence of the width on the parameters of the effective $α- α$ potential is studied in order to impose restrictions on the potentials.
△ Less
Submitted 2 June, 2004; v1 submitted 7 April, 2004;
originally announced April 2004.
-
Effect of dtμquasi-nucleus structure on energy levels of the (dtμ)Xee exotic molecule
Authors:
O. I. Kartavtsev,
A. V. Malykh,
V. P. Permyakov
Abstract:
Precise energies of rovibrational states of the exotic hydrogen-like molecule $(dtμ)Xee$ are of importance for $dtμ$ resonant formation, which is a key process in the muon-catalyzed fusion cycle. The effect of the internal structure and motion of the $dtμ$ quasi-nucleus on energy levels is studied using the three-body description of the $(dtμ)Xee$ molecule based on the hierarchy of scales and co…
▽ More
Precise energies of rovibrational states of the exotic hydrogen-like molecule $(dtμ)Xee$ are of importance for $dtμ$ resonant formation, which is a key process in the muon-catalyzed fusion cycle. The effect of the internal structure and motion of the $dtμ$ quasi-nucleus on energy levels is studied using the three-body description of the $(dtμ)Xee$ molecule based on the hierarchy of scales and corresponding energies of its constituent subsystems. For a number of rovibrational states of $(dtμ)dee$ and $(dtμ)tee$, the shifts and splittings of energy levels are calculated in the second order of the perturbation theory.
△ Less
Submitted 31 March, 2004; v1 submitted 24 March, 2004;
originally announced March 2004.