Skip to main content

Showing 1–23 of 23 results for author: Vylomova, E

.
  1. arXiv:2406.06052  [pdf, other

    cs.CL

    A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications

    Authors: Naomi Baes, Nick Haslam, Ekaterina Vylomova

    Abstract: Historical linguists have identified multiple forms of lexical semantic change. We present a three-dimensional framework for integrating these forms and a unified computational methodology for evaluating them concurrently. The dimensions represent increases or decreases in semantic 1) sentiment, 2) breadth, and 3) intensity. These dimensions can be complemented by the evaluation of shifts in the f… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to the Proceedings of the Association for Computational Linguistics (ACL), 2024. Copyright c 2020 Association for Computational Linguistics (ACL). All Rights Reserved

  2. arXiv:2404.13292  [pdf, other

    cs.CL cs.AI

    Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge

    Authors: Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella

    Abstract: The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been proposed, their evaluation and cross-comparison is still an open problem. As a solution, we propose a combined intrinsic-extrinsic evaluation framework… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  3. arXiv:2404.04809  [pdf, other

    cs.CL

    Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language

    Authors: Raphaël Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, Ekaterina Vylomova

    Abstract: This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT)… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  4. arXiv:2402.12690  [pdf, other

    cs.CL

    Simpson's Paradox and the Accuracy-Fluency Tradeoff in Translation

    Authors: Zheng Wei Lim, Ekaterina Vylomova, Trevor Cohn, Charles Kemp

    Abstract: A good translation should be faithful to the source and should respect the norms of the target language. We address a theoretical puzzle about the relationship between these objectives. On one hand, intuition and some prior work suggest that accuracy and fluency should trade off against each other, and that capturing every detail of the source can only be achieved at the cost of fluency. On the ot… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  5. arXiv:2312.11852  [pdf, other

    cs.CL

    Predicting Human Translation Difficulty with Neural Machine Translation

    Authors: Zheng Wei Lim, Ekaterina Vylomova, Charles Kemp, Trevor Cohn

    Abstract: Human translators linger on some words and phrases more than others, and predicting this variation is a step towards explaining the underlying cognitive processes. Using data from the CRITT Translation Process Research Database, we evaluate the extent to which surprisal and attentional features derived from a Neural Machine Translation (NMT) model account for reading and production times of human… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2206.07615  [pdf, other

    cs.CL

    The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

    Authors: Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova

    Abstract: The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissi… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: The 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

  7. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  8. arXiv:2106.03895  [pdf, other

    cs.CL cs.SD eess.AS

    SIGTYP 2021 Shared Task: Robust Spoken Language Identification

    Authors: Elizabeth Salesky, Badr M. Abdullah, Sabrina J. Mielke, Elena Klyachko, Oleg Serikov, Edoardo Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova

    Abstract: While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and s… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: The first three authors contributed equally

  9. arXiv:2011.14489  [pdf, ps, other

    cs.CL

    Modelling Verbal Morphology in Nen

    Authors: Saliha Muradoğlu, Nicholas Evans, Ekaterina Vylomova

    Abstract: Nen verbal morphology is remarkably complex; a transitive verb can take up to 1,740 unique forms. The combined effect of having a large combinatoric space and a low-resource setting amplifies the need for NLP tools. Nen morphology utilises distributed exponence - a non-trivial means of map** form to meaning. In this paper, we attempt to model Nen verbal morphology using state-of-the-art machine… ▽ More

    Submitted 6 December, 2020; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: ALTA 2020

  10. arXiv:2010.08246  [pdf, other

    cs.CL

    SIGTYP 2020 Shared Task: Prediction of Typological Features

    Authors: Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein

    Abstract: Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that mos… ▽ More

    Submitted 26 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: SigTyp 2020 Shared Task Description Paper @ EMNLP 2020

  11. arXiv:2006.11572  [pdf, other

    cs.CL

    SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

    Authors: Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff , et al. (3 additional authors not shown)

    Abstract: A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource… ▽ More

    Submitted 14 July, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 39 pages, SIGMORPHON

  12. The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

    Authors: Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

    Abstract: The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low… ▽ More

    Submitted 25 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Presented at SIGMORPHON 2019

    Journal ref: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244

  13. arXiv:1905.01420  [pdf, ps, other

    cs.CL

    Contextualization of Morphological Inflection

    Authors: Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn, Jason Eisner

    Abstract: Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological inflection or surface realization, our task input does not provide ``gold'' tags that specify what morphological features to realize on each lemmatized word; rather,… ▽ More

    Submitted 3 May, 2019; originally announced May 2019.

    Comments: NAACL 2019

  14. arXiv:1810.11101  [pdf, other

    cs.CL

    UniMorph 2.0: Universal Morphology

    Authors: Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each inflected form is associated with a lemma, which typically carries its underlying lexical meaning, and a bundle of morphological features from our schema.… ▽ More

    Submitted 25 February, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: LREC 2018

  15. arXiv:1810.07125  [pdf, other

    cs.CL

    The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a… ▽ More

    Submitted 25 February, 2020; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: CoNLL 2018. arXiv admin note: text overlap with arXiv:1706.09031

  16. arXiv:1802.09961  [pdf, ps, other

    cs.CL

    Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

    Authors: **g Peng, Anna Feldman, Ekaterina Vylomova

    Abstract: We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective and the… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

    Comments: EMNLP 2014

  17. arXiv:1708.09151  [pdf, ps, other

    cs.CL

    Paradigm Completion for Derivational Morphology

    Authors: Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky

    Abstract: The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models, a… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

    Comments: EMNLP 2017

  18. arXiv:1707.08458  [pdf, other

    cs.CL

    Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal Associations

    Authors: Ekaterina Vylomova, Andrei Shcherbakov, Yuriy Philippovich, Galina Cherkasova

    Abstract: We present a quantitative analysis of human word association pairs and study the types of relations presented in the associations. We put our main focus on the correlation between response types and respondent characteristics such as occupation and gender by contrasting syntagmatic and paradigmatic associations. Finally, we propose a personalised distributed word association model and show the imp… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: AIST 2017 camera-ready

  19. arXiv:1706.09031  [pdf, other

    cs.CL

    CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL-SIGMORPHON 2017 shared task on supervised morphological generation required systems to be trained and tested in each of 52 typologically diverse languages. In sub-task 1, submitted systems were asked to predict a specific inflected form of a given lemma. In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by… ▽ More

    Submitted 4 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

    Comments: CoNLL 2017

  20. arXiv:1702.06675  [pdf, other

    cs.CL

    Context-Aware Prediction of Derivational Word-forms

    Authors: Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn

    Abstract: Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose the new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder--decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

  21. arXiv:1606.04217  [pdf, other

    cs.NE cs.CL

    Word Representation Models for Morphologically Rich Languages in Neural Machine Translation

    Authors: Ekaterina Vylomova, Trevor Cohn, Xuanli He, Gholamreza Haffari

    Abstract: Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation. In contrast to most modern neural systems of translation, which discard the identity for rare words, in this paper we propose several architectures for learning word representations from character and morpheme level word decompositions. We i… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

  22. arXiv:1604.04873  [pdf, other

    cs.CL

    From Incremental Meaning to Semantic Unit (phrase by phrase)

    Authors: Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin

    Abstract: This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval 2016 Task 10. The approach is primarily based on a combination of word embeddings and parserbased features, and employs unidirectional incremental computation of compositional embeddings for multiword expressions.

    Submitted 17 April, 2016; originally announced April 2016.

    Comments: 7 pages, 1 figure, International Workshop on Semantic Evaluation (SemEval-2016)

    MSC Class: 68T50 ACM Class: I.2.7

  23. arXiv:1509.01692  [pdf, other

    cs.CL

    Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

    Authors: Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin

    Abstract: Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relatio… ▽ More

    Submitted 13 August, 2016; v1 submitted 5 September, 2015; originally announced September 2015.