Skip to main content

Showing 1–10 of 10 results for author: Braslavski, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04487  [pdf, other

    cs.CL

    KazQAD: Kazakh Open-Domain Question Answering Dataset

    Authors: Rustem Yeshpanov, Pavel Efimov, Leonid Boytsov, Ardak Shalkarbayuli, Pavel Braslavski

    Abstract: We introduce KazQAD -- a Kazakh open-domain question answering (ODQA) dataset -- that can be used in both reading comprehension and full ODQA settings, as well as for information retrieval experiments. KazQAD contains just under 6,000 unique questions with extracted short answers and nearly 12,000 passage-level relevance judgements. We use a combination of machine translation, Wikipedia search, an… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: To appear in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  2. arXiv:2310.07008  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Answer Candidate Type Selection: Text-to-Text Language Model for Closed Book Question Answering Meets Knowledge Graphs

    Authors: Mikhail Salnikov, Maria Lysyuk, Pavel Braslavski, Anton Razzhigaev, Valentin Malykh, Alexander Panchenko

    Abstract: Pre-trained Text-to-Text Language Models (LMs), such as T5 or BART yield promising results in the Knowledge Graph Question Answering (KGQA) task. However, the capacity of the models is limited and the quality decreases for questions with less popular entities. In this paper, we present a novel approach which works on top of the pre-trained Text-to-Text QA system to address this issue. Our simple y… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  3. arXiv:2310.02166  [pdf, other

    cs.CL

    Large Language Models Meet Knowledge Graphs to Answer Factoid Questions

    Authors: Mikhail Salnikov, Hai Le, Prateek Rajput, Irina Nikishina, Pavel Braslavski, Valentin Malykh, Alexander Panchenko

    Abstract: Recently, it has been shown that the incorporation of structured knowledge into Large Language Models significantly improves the results for a variety of NLP tasks. In this paper, we propose a method for exploring pre-trained Text-to-Text Language Models enriched with additional information from Knowledge Graphs for answering factoid questions. More specifically, we propose an algorithm for subgra… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  4. NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

    Authors: Natalia Loukachevitch, Suresh Manandhar, Elina Baral, Igor Rozhkov, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Elena Tutubalina

    Abstract: This paper describes NEREL-BIO -- an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Submitted to Bioinformatics (Publisher: Oxford University Press)

    Journal ref: Bioinformatics, Volume 39, Issue 4, April 2023, btad161

  5. The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer

    Authors: Pavel Efimov, Leonid Boytsov, Elena Arslanova, Pavel Braslavski

    Abstract: Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks. Cao et al. (2020) proposed a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a small parallel corpus to make embeddings of related words across languages similar to each other. They showed it to be effective in NLI for five European langu… ▽ More

    Submitted 31 October, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Presented at ECIR 2023

  6. arXiv:2108.13112  [pdf, other

    cs.CL

    NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

    Authors: Natalia Loukachevitch, Ekaterina Artemova, Tatiana Batura, Pavel Braslavski, Ilia Denisov, Vladimir Ivanov, Suresh Manandhar, Alexander Pugachev, Elena Tutubalina

    Abstract: In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse le… ▽ More

    Submitted 3 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: accepted to RANLP

  7. A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

    Authors: Iurii Mokrii, Leonid Boytsov, Pavel Braslavski

    Abstract: Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In co… ▽ More

    Submitted 21 November, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Journal ref: SIGIR 2021 (44th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  8. RuBQ: A Russian Dataset for Question Answering over Wikidata

    Authors: Vladislav Korablinov, Pavel Braslavski

    Abstract: The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of que… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

  9. SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis

    Authors: Pavel Efimov, Andrey Chertok, Leonid Boytsov, Pavel Braslavski

    Abstract: SberQuAD -- a large scale analog of Stanford SQuAD in the Russian language - is a valuable resource that has not been properly presented to the scientific community. We fill this gap by providing a description, a thorough analysis, and baseline experimental results.

    Submitted 2 May, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

  10. Personal Names Popularity Estimation and its Application to Record Linkage

    Authors: Ksenia Zhagorina, Pavel Braslavski, Vladimir Gusev

    Abstract: This study deals with a fairly simply formulated problem -- how to estimate the number of people bearing the same full name in a large population. Estimation of name popularity can leverage personal name matching in databases and be of interest for many other domains. A distinctive feature of large collections of names is that they contain a large number of unique items, which is challenging for s… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

    Comments: This is an extended version of a short paper presented at ADBIS2018