Skip to main content

Showing 1–6 of 6 results for author: Wolf-Sonkin, L

.
  1. arXiv:2007.01176  [pdf

    cs.CL

    Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

    Authors: Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall

    Abstract: This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3) full sentence parallel data in both a native script of the language and the basic Latin alphabet. We document the methods used for preparation and s… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: Published at LREC 2020

  2. arXiv:2005.01204  [pdf, other

    cs.CL

    On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

    Authors: Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

    Abstract: We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. For all six languages, we find that there is a statistically significant relationship. We also find that there are statistically significant relat… ▽ More

    Submitted 3 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 figures, 4 tables, TACL(a) final submission

  3. arXiv:1910.13497  [pdf, other

    cs.CL

    Quantifying the Semantic Core of Gender Systems

    Authors: Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

    Abstract: Many of the world's languages employ grammatical gender on the lexeme. For example, in Spanish, the word for 'house' (casa) is feminine, whereas the word for 'paper' (papel) is masculine. To a speaker of a genderless language, this assignment seems to exist with neither rhyme nor reason. But is the assignment of inanimate nouns to grammatical genders truly arbitrary? We present the first large-sca… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: 6 pages, 2 figures, accepted to EMNLP 2019

  4. The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

    Authors: Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

    Abstract: The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low… ▽ More

    Submitted 25 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Presented at SIGMORPHON 2019

    Journal ref: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244

  5. arXiv:1904.02839  [pdf, other

    cs.CL cs.LG

    Combining Sentiment Lexica with a Multi-View Variational Autoencoder

    Authors: Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein

    Abstract: When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally, it is of interest to unify these labels from disparate scales to both achieve maximal coverage over words and to create a single, more robust sentimen… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: To appear in NAACL-HLT 2019

  6. arXiv:1806.03746  [pdf, other

    cs.CL

    A Structured Variational Autoencoder for Contextual Morphological Inflection

    Authors: Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan Cotterell

    Abstract: Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the… ▽ More

    Submitted 25 February, 2020; v1 submitted 10 June, 2018; originally announced June 2018.

    Comments: Published at ACL 2018