Skip to main content

Showing 1–20 of 20 results for author: Krizhanovsky, A

Searching in archive cs. Search in all archives.
.
  1. The Open corpus of the Veps and Karelian languages: overview and applications

    Authors: Tatyana Boyko, Nina Zaitseva, Natalia Krizhanovskaya, Andrew Krizhanovsky, Irina Novak, Nataliya Pellinen, Aleksandra Rodionova

    Abstract: A growing priority in the study of Baltic-Finnic languages of the Republic of Karelia has been the methods and tools of corpus linguistics. Since 2016, linguists, mathematicians, and programmers at the Karelian Research Centre have been working with the Open Corpus of the Veps and Karelian Languages (VepKar), which is an extension of the Veps Corpus created in 2009. The VepKar corpus comprises tex… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: 9 pages, 9 figures, published in the journal

    MSC Class: 68T50 ACM Class: H.3.1; H.3.6

    Journal ref: KnE Social Sciences. 7 (3). 2022. P. 29-40

  2. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  3. arXiv:2103.11859  [pdf, other

    cs.CL cs.IR

    Part of speech and gramset tagging algorithms for unknown words based on morphological dictionaries of the Veps and Karelian languages

    Authors: Andrew Krizhanovsky, Natalia Krizhanovsky, Irina Novak

    Abstract: This research devoted to the low-resource Veps and Karelian languages. Algorithms for assigning part of speech tags to words and grammatical properties to words are presented in the article. These algorithms use our morphological dictionaries, where the lemma, part of speech and a set of grammatical features (gramset) are known for each word form. The algorithms are based on the analogy hypothesis… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 17 pages, 4 tables, 7 figures, published in the conference proceeding

    MSC Class: 68T50 ACM Class: H.3.1; H.3.6

  4. arXiv:2006.11572  [pdf, other

    cs.CL

    SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

    Authors: Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff , et al. (3 additional authors not shown)

    Abstract: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource… ▽ More

    Submitted 14 July, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 39 pages, SIGMORPHON

  5. arXiv:2002.00734  [pdf

    cs.CL cs.IR

    Analysis of the quotation corpus of the Russian Wiktionary

    Authors: A. Smirnov, T. Levashova, A. Karpov, I. Kipyatkova, A. Ronzhin, A. Krizhanovsky, N. Krizhanovsky

    Abstract: The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). These quotations were extracted and saved in the relational database of a machine-readable dictionary. For this database, tables related to the quotations… ▽ More

    Submitted 20 January, 2020; originally announced February 2020.

    Comments: 12 pages, 3 tables, 5 figures, published in the journal (preprint)

    MSC Class: 68T50 ACM Class: H.3.3

    Journal ref: Research in Computing Science, Vol. 56, pp. 101-112, 2012

  6. arXiv:2001.11285  [pdf, other

    cs.CL

    LowResourceEval-2019: a shared task on morphological analysis for low-resource languages

    Authors: Elena Klyachko, Alexey Sorokin, Natalia Krizhanovskaya, Andrew Krizhanovsky, Galina Ryazanskaya

    Abstract: The paper describes the results of the first shared task on morphological analysis for the languages of Russia, namely, Evenki, Karelian, Selkup, and Veps. For the languages in question, only small-sized corpora are available. The tasks include morphological analysis, word form generation and morpheme segmentation. Four teams participated in the shared task. Most of them use machine-learning appro… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: 16 pages, 4 tables, 2 figures, published in the conference proceeding

    MSC Class: 68T50

    Journal ref: Dialog 2019, Issue 18, Supplementary volume, Pp. 45-62

  7. arXiv:2001.04719  [pdf

    cs.IR

    Semi-automatic methods for adding words to the dictionary of VepKar corpus based on inflectional rules extracted from Wiktionary

    Authors: Natalia Krizhanovsky, Andrew Krizhanovsky

    Abstract: The article describes a technique for using English Wiktionary inflection tables for generating word forms for Veps verbs and nominals in the Open corpus of Veps and Karelian languages. The information concerning Karelian and Veps Wiktionary entries with inflection tables is given. The operating principle of the Wiktionary static and dynamic templates is explained with the use of the jogi (river)… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: 10 pages, 1 table, 2 figures, published in the conference proceeding https://events.spbu.ru/eventsContent/events/2019/corpora/corp_sborn.pdf#page=211

    Journal ref: Corpora 2019, 24-28 June, 2019. Saint-Petersburg. P. 211-217

  8. arXiv:1805.09559  [pdf, other

    cs.IR cs.CL

    WSD algorithm based on a new method of vector-word contexts proximity calculation via epsilon-filtration

    Authors: Alexander Kirillov, Natalia Krizhanovsky, Andrew Krizhanovsky

    Abstract: The problem of word sense disambiguation (WSD) is considered in the article. Given a set of synonyms (synsets) and sentences with these synonyms. It is necessary to select the meaning of the word in the sentence automatically. 1285 sentences were tagged by experts, namely, one of the dictionary meanings was selected by experts for target words. To solve the WSD-problem, an algorithm based on a new… ▽ More

    Submitted 18 June, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: 15 pages, 1 table, 15 figures, accepted in the journal Transactions of Karelian Research Centre of the Russian Academy of Sciences

    MSC Class: 68T50 ACM Class: I.5.3; H.3.1; H.3.3

    Journal ref: Transactions of Karelian Research Centre RAS. No. 7. 2018. P. 149-163

  9. arXiv:1803.01580  [pdf, ps, other

    cs.CL cs.IR

    Calculated attributes of synonym sets

    Authors: Andrew Krizhanovsky, Alexander Kirillov

    Abstract: The goal of formalization, proposed in this paper, is to bring together, as near as possible, the theoretic linguistic problem of synonym conception and the computer linguistic methods based generally on empirical intuitive unjustified factors. Using the word vector representation we have proposed the geometric approach to mathematical modeling of synonym set (synset). The word embedding is based… ▽ More

    Submitted 5 March, 2018; originally announced March 2018.

    Comments: 6 pages, 2 tables, 2 figures, preprint

    MSC Class: 68T50 ACM Class: I.5.3; H.3.1; H.3.3

  10. arXiv:1109.0732  [pdf

    cs.IR

    Multilingual ontology matching based on Wiktionary data accessible via SPARQL endpoint

    Authors: Feiyu Lin, Andrew Krizhanovsky

    Abstract: Interoperability is a feature required by the Semantic Web. It is provided by the ontology matching methods and algorithms. But now ontologies are presented not only in English, but in other languages as well. It is important to use an automatic translation for obtaining correct matching pairs in multilingual ontology matching. The translation into many languages could be based on the Google Trans… ▽ More

    Submitted 26 October, 2011; v1 submitted 4 September, 2011; originally announced September 2011.

    Comments: 8 pages, 3 tables, 4 figures, In: Proceedings of the 13th Russian Conference on Digital Libraries RCDL'2011. October 19-22, Voronezh, Russia. - pp. 19-26. (preprint)

    MSC Class: 68W25; 90C35 ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  11. arXiv:1011.1368  [pdf

    cs.IR

    Transformation of Wiktionary entry structure into tables and relations in a relational database schema

    Authors: A. A. Krizhanovsky

    Abstract: This paper addresses the question of automatic data extraction from the Wiktionary, which is a multilingual and multifunctional dictionary. Wiktionary is a collaborative project working on the same principles as the Wikipedia. The Wiktionary entry is a plain text from the text processing point of view. Wiktionary guidelines prescribe the entry layout and rules, which should be followed by editors… ▽ More

    Submitted 5 November, 2010; originally announced November 2010.

    Comments: 10 pages, 7 figures, preprint

    MSC Class: 68W25; 90C35 ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  12. arXiv:1006.5040  [pdf, other

    cs.IR

    The comparison of Wiktionary thesauri transformed into the machine-readable format

    Authors: A. A. Krizhanovsky

    Abstract: Wiktionary is a unique, peculiar, valuable and original resource for natural language processing (NLP). The paper describes an open-source Wiktionary parser: its architecture and requirements followed by a description of Wiktionary features to be taken into account, some open problems of Wiktionary and the parser. The current implementation of the parser extracts the definitions, semantic relation… ▽ More

    Submitted 25 June, 2010; originally announced June 2010.

    Comments: 23 pages, 3 tables, 6 figures, preprint

    MSC Class: 68W25; 90C35 ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  13. arXiv:0907.2209  [pdf

    cs.IR

    Related terms search based on WordNet / Wiktionary and its application in Ontology Matching

    Authors: A. A. Krizhanovsky, Feiyu Lin

    Abstract: A set of ontology matching algorithms (for finding correspondences between concepts) is based on a thesaurus that provides the source data for the semantic distance calculations. In this wiki era, new resources may spring up and improve this kind of semantic search. In the paper a solution of this task based on Russian Wiktionary is compared to WordNet based algorithms. Metrics are estimated usi… ▽ More

    Submitted 12 October, 2009; v1 submitted 13 July, 2009; originally announced July 2009.

    Comments: 7 pages, 2 tables, 3 figures; In: RCDL 2009. September 17-21, Petrozavodsk, Russia. - pp. 363-369

    ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  14. arXiv:0808.1753  [pdf

    cs.IR cs.CL

    Index wiki database: design and experiments

    Authors: A. A. Krizhanovsky

    Abstract: With the fantastic growth of Internet usage, information search in documents of a special type called a "wiki page" that is written using a simple markup language, has become an important problem. This paper describes the software architectural model for indexing wiki texts in three languages (Russian, English, and German) and the interaction between the software components (GATE, Lemmatizer, an… ▽ More

    Submitted 23 September, 2008; v1 submitted 12 August, 2008; originally announced August 2008.

    Comments: 18 pages, 4 tables, 4 figures; FLINS'08, Corpus Linguistics'08, AIS/CAD'08; v2: table 3 changed

    ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  15. arXiv:0804.2354  [pdf

    cs.IR cs.CL

    Information filtering based on wiki index database

    Authors: A. V. Smirnov, A. A. Krizhanovsky

    Abstract: In this paper we present a profile-based approach to information filtering by an analysis of the content of text documents. The Wikipedia index database is created and used to automatically generate the user profile from the user document collection. The problem-oriented Wikipedia subcorpora are created (using knowledge extracted from the user profile) for each topic of user interests. The index… ▽ More

    Submitted 8 May, 2008; v1 submitted 15 April, 2008; originally announced April 2008.

    Comments: 9 pages, 1 table, 2 figures, 8th International FLINS Conference on Computational Intelligence in Decision and Control, Madrid, Spain, September 21-24, 2008; v2: typo

    ACM Class: I.7.2; I.7.3; I.7.5; H.3.1; H.3.3

  16. arXiv:0710.0169  [pdf

    cs.IR cs.CL

    Evaluation experiments on related terms search in Wikipedia: Information Content and Adapted HITS (In Russian)

    Authors: A. A. Krizhanovsky

    Abstract: The classification of metrics and algorithms search for related terms via WordNet, Roget's Thesaurus, and Wikipedia was extended to include adapted HITS algorithm. Evaluation experiments on Information Content and adapted HITS algorithm are described. The test collection of Russian word pairs with human-assigned similarity judgments is proposed. ----- Klassifikacija metrik i algoritmov poisk… ▽ More

    Submitted 16 January, 2008; v1 submitted 1 October, 2007; originally announced October 2007.

    Comments: 10 pages, 1 figure, 3 tables, in Russian, short version of the paper to be published in Proceedings of the Wiki-Conference 2007, Russia, St. Petersburg, October 27-28. http://tinyurl.com/2czd6e ; v3: +figure; v4: typo in Table 3; v5: +desc (res_hypo formula); v6: typo

    ACM Class: H.3.1; H.3.3; H.4.3; G.2.2

  17. arXiv:cs/0610058  [pdf

    cs.IR

    Context-sensitive access to e-document corpus

    Authors: A. V. Smirnov, T. V. Levashova, M. P. Pashkin, N. G. Shilov, A. A. Krizhanovsky, A. M. Kashevnik, A. S. Komarova

    Abstract: The methodology of context-sensitive access to e-documents considers context as a problem model based on the knowledge extracted from the application domain, and presented in the form of application ontology. Efficient access to an information in the text form is needed. Wiki resources as a modern text format provides huge number of text in a semi formalized structure. At the first stage of the… ▽ More

    Submitted 11 October, 2006; originally announced October 2006.

    Comments: 9 pages, 1 figure, short version of this paper was presented at the International Conference Corpus Linguistics 2006. October 10-14, St. Petersburg, Russia

    ACM Class: H.3.1; H.3.3; H.4.3; G.2.2

  18. arXiv:cs/0606128  [pdf

    cs.IR cs.DM

    Automatic forming lists of semantically related terms based on texts rating in the corpus with hyperlinks and categories (In Russian)

    Authors: A. Krizhanovsky

    Abstract: HITS adapted algorithm for synonym search, the program architecture, and the program work evaluation with test examples are presented in the paper. Synarcher program for synonym (and related terms) search in the text corpus of special structure (Wikipedia) was developed. The results of search are presented in the form of a graph. It is possible to explore the graph and search graph elements inte… ▽ More

    Submitted 30 June, 2006; originally announced June 2006.

    Comments: 6 pages, 1 figure, in Russian, PDF, for other formats see http://whinger.narod.ru/paper/index.html

    ACM Class: H.3.1; H.3.3; H.4.3; G.2.2

  19. arXiv:cs/0606097  [pdf, ps, other

    cs.IR cs.DM

    Synonym search in Wikipedia: Synarcher

    Authors: A. Krizhanovsky

    Abstract: The program Synarcher for synonym (and related terms) search in the text corpus of special structure (Wikipedia) was developed. The results of the search are presented in the form of graph. It is possible to explore the graph and search for graph elements interactively. Adapted HITS algorithm for synonym search, program architecture, and program work evaluation with test examples are presented i… ▽ More

    Submitted 23 June, 2006; v1 submitted 22 June, 2006; originally announced June 2006.

    Comments: 4 pages, 2 figures, Synarcher program is available at http://synarcher.sourceforge.net

    ACM Class: H.3.1; H.3.3; H.4.3; G.2.2

  20. arXiv:cs/0501077  [pdf

    cs.IR cs.CL

    Ontology-Based Users & Requests Clustering in Customer Service Management System

    Authors: Alexander Smirnov, Mikhail Pashkin, Nikolai Chilov, Tatiana Levashova, Andrew Krizhanovsky, Alexey Kashevnik

    Abstract: Customer Service Management is one of major business activities to better serve company customers through the introduction of reliable processes and procedures. Today this kind of activities is implemented through e-services to directly involve customers into business processes. Traditionally Customer Service Management involves application of data mining techniques to discover usage patterns fr… ▽ More

    Submitted 27 May, 2005; v1 submitted 26 January, 2005; originally announced January 2005.

    Comments: 15 pages, 4 figures, published in Lecture Notes in Computer Science

    ACM Class: H.3.3

    Journal ref: Smirnov A., Pashkin M., Chilov N., Levashova T., Krizhanovsky A., Kashevnik A. 2005. Ontology-Based Users and Requests Clustering in Customer Service Management System. Springer-Verlag GmbH, Lecture Notes in Computer Science, 3505: 231-246