RUSSE: The First Workshop on Russian Semantic Similarity
Authors:
Alexander Panchenko,
Natalia Loukachevitch,
Dmitry Ustalov,
Denis Paperno,
Christian Meyer,
Natalia Konstantinova
Abstract:
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has features…
▽ More
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has features, such as rich morphology and free word order, which make it significantly different from English, German, and other well-studied languages. We attempt to bridge this gap by proposing a shared task on the semantic similarity of Russian nouns. Our key contribution is an evaluation methodology based on four novel benchmark datasets for the Russian language. Our analysis of the 105 submissions from 19 teams reveals that successful approaches for English, such as distributional and skip-gram models, are directly applicable to Russian as well. On the one hand, the best results in the contest were obtained by sophisticated supervised models that combine evidence from different sources. On the other hand, completely unsupervised approaches, such as a skip-gram model estimated on a large-scale corpus, were able score among the top 5 systems.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
Human and Machine Judgements for Russian Semantic Relatedness
Authors:
Alexander Panchenko,
Dmitry Ustalov,
Nikolay Arefyev,
Denis Paperno,
Natalia Konstantinova,
Natalia Loukachevitch,
Chris Biemann
Abstract:
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgments about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgments, and data…
▽ More
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgments about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgments, and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (word_i, word_j, relatedness_ij). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organize a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.