Search | arXiv e-print repository

On the Automatic Generation and Simplification of Children's Stories

Authors: Maria Valentini, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, Katharina Kann

Abstract: With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic. Working toward the goal of age-appropriate simplicity in generated educational texts, we first examine the ability of several popular LLMs to generate stories with properly adjusted lexical and readability levels. We find that, in spite o… ▽ More With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic. Working toward the goal of age-appropriate simplicity in generated educational texts, we first examine the ability of several popular LLMs to generate stories with properly adjusted lexical and readability levels. We find that, in spite of the growing capabilities of LLMs, they do not yet possess the ability to limit their vocabulary to levels appropriate for younger age groups. As a second experiment, we explore the ability of state-of-the-art lexical simplification models to generalize to the domain of children's stories and, thus, create an efficient pipeline for their automatic generation. In order to test these models, we develop a dataset of child-directed lexical simplification instances, with examples taken from the LLM-generated stories in our first experiment. We find that, while the strongest-performing current lexical simplification models do not perform as well on material designed for children due to their reliance on large language models behind the scenes, some models that still achieve fairly strong results on general data can mimic or even improve their performance on children-directed data with proper fine-tuning, which we conduct using our newly created child-directed simplification dataset. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (main conference)

arXiv:2306.06804 [pdf, other]

Neural Machine Translation for the Indigenous Languages of the Americas: An Introduction

Authors: Manuel Mager, Rajat Bhatnagar, Graham Neubig, Ngoc Thang Vu, Katharina Kann

Abstract: Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Americas are among them, having a limited amount of p… ▽ More Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Americas are among them, having a limited amount of parallel and monolingual data, if any. Here, we present an introduction to the interested reader to the basic challenges, concepts, and techniques that involve the creation of MT systems for these languages. Finally, we discuss the recent advances and findings and open questions, product of an increased interest of the NLP community in these languages. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted to AmericasNLP 2023

arXiv:2305.19474 [pdf, other]

Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers

Authors: Manuel Mager, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Abstract: In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine transla… ▽ More In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data collection, modeling and deploying machine translation systems thus result in new ethical questions that must be addressed. Motivated by this, we first survey the existing literature on ethical considerations for the documentation, translation, and general natural language processing for Indigenous languages. Afterward, we conduct and analyze an interview study to shed light on the positions of community leaders, teachers, and language activists regarding ethical concerns for the automatic translation of their languages. Our results show that the inclusion, at different degrees, of native speakers and community members is vital to performing better and more ethical research on Indigenous languages. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted to ACL2023 Main Conference

arXiv:2305.16581 [pdf, other]

An Investigation of Noise in Morphological Inflection

Authors: Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann

Abstract: With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an er… ▽ More With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data. Then, we compare the effect of different types of noise on multiple state-of-the-art inflection models. Finally, we propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise. Our experiments show that various architectures are impacted differently by separate types of noise, but encoder-decoders tend to be more robust to noise than models trained with a copy bias. CMLM pretraining helps transformers, but has lower impact on LSTMs. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: ACL 2023 Findings

arXiv:2302.07912 [pdf, other]

Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

Authors: Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, Luis Chiruzzo, John E. Ortega, Gustavo A. Giménez-Lugo, Rolando Coto-Solano, Katharina Kann

Abstract: Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribu… ▽ More Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri--Spanish, Guarani--Spanish, Quechua--Spanish, and Shipibo-Konibo--Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: EACL 2023

arXiv:2212.09252 [pdf, other]

Mind the Knowledge Gap: A Survey of Knowledge-enhanced Dialogue Systems

Authors: Sagi Shaier, Lawrence Hunter, Katharina Kann

Abstract: Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness. Enhancing DSs with knowledge alleviates this problem, but, as many ways of doing so exist, kee** track of all proposed methods is difficult. Here, we present the first survey of knowledge-enhanced DSs. We define three categories of systems - internal, external, and hybrid -… ▽ More Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness. Enhancing DSs with knowledge alleviates this problem, but, as many ways of doing so exist, kee** track of all proposed methods is difficult. Here, we present the first survey of knowledge-enhanced DSs. We define three categories of systems - internal, external, and hybrid - based on the knowledge they use. We survey the motivation for enhancing DSs with knowledge, used datasets, and methods for knowledge search, knowledge encoding, and knowledge incorporation. Finally, we propose how to improve existing systems based on theories from linguistics and cognitive science. △ Less

Submitted 20 December, 2022; v1 submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.16858 [pdf, other]

A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

Authors: Katharina Kann, Shiran Dudy, Arya D. McCarthy

Abstract: The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products. However, this paper argues that we have been less successful than we should have been and reflects on where and how the field fails to ta… ▽ More The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products. However, this paper argues that we have been less successful than we should have been and reflects on where and how the field fails to tap its full potential. Specifically, we demonstrate that, in recent years, subpar time allocation has been a major obstacle for NLP research. We outline multiple concrete problems together with their negative consequences and, importantly, suggest remedies to improve the status quo. We hope that this paper will be a starting point for discussions around which common practices are -- or are not -- beneficial for NLP research. △ Less

Submitted 30 November, 2022; originally announced November 2022.

Comments: To appear at EMNLP 2022

arXiv:2210.12321 [pdf, other]

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection

Authors: Adam Wiemerslage, Shiran Dudy, Katharina Kann

Abstract: Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology. This debate has gravitated into NLP by way of the question: Are neural networks a feasible account for human behavior in morphological inflection? We address that question by measuring the correlation between human judgments and neural network probabilities for u… ▽ More Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology. This debate has gravitated into NLP by way of the question: Are neural networks a feasible account for human behavior in morphological inflection? We address that question by measuring the correlation between human judgments and neural network probabilities for unknown word inflections. We test a larger range of architectures than previously studied on two important tasks for the cognitive processing debate: English past tense, and German number inflection. We find evidence that the Transformer may be a better account of human behavior than LSTMs on these datasets, and that LSTM features known to increase inflection accuracy do not always result in more human-like behavior. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2203.10753 [pdf, other]

Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability

Authors: Yoshinari Fu**uma, Jordan Boyd-Graber, Katharina Kann

Abstract: Pretrained multilingual models enable zero-shot learning even for unseen languages, and that performance can be further improved via adaptation prior to finetuning. However, it is unclear how the number of pretraining languages influences a model's zero-shot learning for languages unseen during pretraining. To fill this gap, we ask the following research questions: (1) How does the number of pretr… ▽ More Pretrained multilingual models enable zero-shot learning even for unseen languages, and that performance can be further improved via adaptation prior to finetuning. However, it is unclear how the number of pretraining languages influences a model's zero-shot learning for languages unseen during pretraining. To fill this gap, we ask the following research questions: (1) How does the number of pretraining languages influence zero-shot performance on unseen target languages? (2) Does the answer to that question change with model adaptation? (3) Do the findings for our first question change if the languages used for pretraining are all related? Our experiments on pretraining with related languages indicate that choosing a diverse set of languages is crucial. Without model adaptation, surprisingly, increasing the number of pretraining languages yields better results up to adding related languages, after which performance plateaus. In contrast, with model adaptation via continued pretraining, pretraining on a larger number of languages often gives further improvement, suggesting that model adaptation is crucial to exploit additional pretraining languages. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: ACL 2022 camera ready

arXiv:2203.08954 [pdf, other]

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

Authors: Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Abstract: Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically insp… ▽ More Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and Wixarika. Then, we compare the morphologically inspired segmentation methods against Byte-Pair Encodings (BPEs) as inputs for machine translation (MT) when translating to and from Spanish. We show that for all language pairs except for Nahuatl, an unsupervised morphological segmentation algorithm outperforms BPEs consistently and that, although supervised methods achieve better segmentation scores, they under-perform in MT challenges. Finally, we contribute two new morphological segmentation datasets for Raramuri and Shipibo-Konibo, and a parallel corpus for Raramuri--Spanish. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted to Findings of ACL 2022

arXiv:2203.08909 [pdf, other]

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

Authors: Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

Abstract: Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent… ▽ More Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language's morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2110.08182 [pdf, other]

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Authors: Cory Paik, Stéphane Aroca-Ouellette, Alessandro Roncone, Katharina Kann

Abstract: Recent work has raised concerns about the inherent limitations of text-only pretraining. In this paper, we first demonstrate that reporting bias, the tendency of people to not state the obvious, is one of the causes of this limitation, and then investigate to what extent multimodal training can mitigate this issue. To accomplish this, we 1) generate the Color Dataset (CoDa), a dataset of human-per… ▽ More Recent work has raised concerns about the inherent limitations of text-only pretraining. In this paper, we first demonstrate that reporting bias, the tendency of people to not state the obvious, is one of the causes of this limitation, and then investigate to what extent multimodal training can mitigate this issue. To accomplish this, we 1) generate the Color Dataset (CoDa), a dataset of human-perceived color distributions for 521 common objects; 2) use CoDa to analyze and compare the color distribution found in text, the distribution captured by language models, and a human's perception of color; and 3) investigate the performance differences between text-only and multimodal models on CoDa. Our results show that the distribution of colors that a language model recovers correlates more strongly with the inaccurate distribution found in text than with the ground-truth, supporting the claim that reporting bias negatively impacts and inherently limits text-only training. We then demonstrate that multimodal models can leverage their visual training to mitigate these effects, providing a promising avenue for future research. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted to EMNLP 2021, 9 Pages

arXiv:2108.06598 [pdf, other]

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

Authors: Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen

Abstract: We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following di… ▽ More We present the findings of the LoResMT 2021 shared task which focuses on machine translation (MT) of COVID-19 data for both low-resource spoken and sign languages. The organization of this task was conducted as part of the fourth workshop on technologies for machine translation of low resource languages (LoResMT). Parallel corpora is presented and publicly available which includes the following directions: English$\leftrightarrow$Irish, English$\leftrightarrow$Marathi, and Taiwanese Sign language$\leftrightarrow$Traditional Chinese. Training data consists of 8112, 20933 and 128608 segments, respectively. There are additional monolingual data sets for Marathi and English that consist of 21901 segments. The results presented here are based on entries from a total of eight teams. Three teams submitted systems for English$\leftrightarrow$Irish while five teams submitted systems for English$\leftrightarrow$Marathi. Unfortunately, there were no systems submissions for the Taiwanese Sign language$\leftrightarrow$Traditional Chinese task. Maximum system performance was computed using BLEU and follow as 36.0 for English--Irish, 34.6 for Irish--English, 24.2 for English--Marathi, and 31.3 for Marathi--English. △ Less

Submitted 18 August, 2021; v1 submitted 14 August, 2021; originally announced August 2021.

Comments: 10 pages

arXiv:2107.03690 [pdf, other]

Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL)

Authors: Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alex Ratner, Dietrich Klakow

Abstract: Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021. In this workshop, we want to advance theory, methods and tools for allowing experts to express prior coded knowledge for automatic data annotations that can be used to train arbitrary deep neural networks for prediction. The ICLR 2021 Workshop on Weak Supervision aims at advancing methods that help… ▽ More Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021. In this workshop, we want to advance theory, methods and tools for allowing experts to express prior coded knowledge for automatic data annotations that can be used to train arbitrary deep neural networks for prediction. The ICLR 2021 Workshop on Weak Supervision aims at advancing methods that help modern machine-learning methods to generalize from knowledge provided by experts, in interaction with observable (unlabeled) data. In total, 15 papers were accepted. All the accepted contributions are listed in these Proceedings. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2106.06875 [pdf, other]

Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

Authors: Rajat Bhatnagar, Ananya Ganesh, Katharina Kann

Abstract: High-performing machine translation (MT) systems can help overcome language barriers while making it possible for everyone to communicate and use language technologies in the language of their choice. However, such systems require large amounts of parallel sentences for training, and translators can be difficult to find and expensive. Here, we present a data collection strategy for MT which, in co… ▽ More High-performing machine translation (MT) systems can help overcome language barriers while making it possible for everyone to communicate and use language technologies in the language of their choice. However, such systems require large amounts of parallel sentences for training, and translators can be difficult to find and expensive. Here, we present a data collection strategy for MT which, in contrast, is cheap and simple, as it does not require bilingual speakers. Based on the insight that humans pay specific attention to movements, we use graphics interchange formats (GIFs) as a pivot to collect parallel sentences from monolingual annotators. We use our strategy to collect data in Hindi, Tamil and English. As a baseline, we also collect data using images as a pivot. We perform an intrinsic evaluation by manually evaluating a subset of the sentence pairs and an extrinsic evaluation by finetuning mBART on the collected data. We find that sentences collected via GIFs are indeed of higher quality. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: 5 pages, 1 figure, ACL-IJCNLP 2021 submission, Natural Language Processing, Data Collection, Monolingual Speakers, Machine Translation, GIFs, Images

ACM Class: I.2.7

arXiv:2106.05249 [pdf, other]

What Would a Teacher Do? Predicting Future Talk Moves

Authors: Ananya Ganesh, Martha Palmer, Katharina Kann

Abstract: Recent advances in natural language processing (NLP) have the ability to transform how classroom learning takes place. Combined with the increasing integration of technology in today's classrooms, NLP systems leveraging question answering and dialog processing techniques can serve as private tutors or participants in classroom discussions to increase student engagement and learning. To progress to… ▽ More Recent advances in natural language processing (NLP) have the ability to transform how classroom learning takes place. Combined with the increasing integration of technology in today's classrooms, NLP systems leveraging question answering and dialog processing techniques can serve as private tutors or participants in classroom discussions to increase student engagement and learning. To progress towards this goal, we use the classroom discourse framework of academically productive talk (APT) to learn strategies that make for the best learning experience. In this paper, we introduce a new task, called future talk move prediction (FTMP): it consists of predicting the next talk move -- an utterance strategy from APT -- given a conversation history with its corresponding talk moves. We further introduce a neural network model for this task, which outperforms multiple baselines by a large margin. Finally, we compare our model's performance on FTMP to human performance and show several similarities between the two. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: 13 pages, 3 figures; To appear in Findings of ACL 2021

arXiv:2106.03634 [pdf, other]

PROST: Physical Reasoning of Objects through Space and Time

Authors: Stéphane Aroca-Ouellette, Cory Paik, Alessandro Roncone, Katharina Kann

Abstract: We present a new probing dataset named PROST: Physical Reasoning about Objects Through Space and Time. This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates, covering 10 physical reasoning concepts. All questions are designed to probe both causal and masked language models in a zero-shot setting. We conduct an extensive analysis which demonstrates that stat… ▽ More We present a new probing dataset named PROST: Physical Reasoning about Objects Through Space and Time. This dataset contains 18,736 multiple-choice questions made from 14 manually curated templates, covering 10 physical reasoning concepts. All questions are designed to probe both causal and masked language models in a zero-shot setting. We conduct an extensive analysis which demonstrates that state-of-the-art pretrained models are inadequate at physical reasoning: they are influenced by the order in which answer options are presented to them, they struggle when the superlative in a question is inverted (e.g., most <-> least), and increasing the amount of pretraining data and parameters only yields minimal improvements. These results provide support for the hypothesis that current pretrained models' ability to reason about physical interactions is inherently limited by a lack of real world experience. By highlighting these limitations, we hope to motivate the development of models with a human-like understanding of the physical world. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted to ACL-Findings 2021, 9 Pages

arXiv:2106.02124 [pdf, other]

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Authors: Abteen Ebrahimi, Katharina Kann

Abstract: Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining. While methods exist to improve performance for unseen languages, they have almost exclusively been evaluated using amounts of raw text only available for a small fraction of the world's languages. In this paper, we evaluate the performance of existing m… ▽ More Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining. While methods exist to improve performance for unseen languages, they have almost exclusively been evaluated using amounts of raw text only available for a small fraction of the world's languages. In this paper, we evaluate the performance of existing methods to adapt PMMs to new languages using a resource available for over 1600 languages: the New Testament. This is challenging for two reasons: (1) the small corpus size, and (2) the narrow domain. While performance drops for all approaches, we surprisingly still see gains of up to $17.69\%$ accuracy for part-of-speech tagging and $6.29$ F1 for NER on average over all languages as compared to XLM-R. Another unexpected finding is that continued pretraining, the simplest approach, performs best. Finally, we perform a case study to disentangle the effects of domain and size and to shed light on the influence of the finetuning source language. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: Accepted to ACL 2021

arXiv:2104.08726 [pdf, other]

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages

Authors: Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Meza-Ruiz, Gustavo A. Giménez-Lugo, Elisabeth Mager, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Ngoc Thang Vu, Katharina Kann

Abstract: Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we… ▽ More Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, an extension of XNLI (Conneau et al., 2018) to 10 indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. Additionally, we explore model adaptation via continued pretraining and provide an analysis of the dataset by considering hypothesis-only models. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%. Continued pretraining offers improvements, with an average accuracy of 44.05%. Surprisingly, training on poorly translated data by far outperforms all other methods with an accuracy of 48.72%. △ Less

Submitted 16 March, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

Comments: Accepted to ACL 2022

arXiv:2101.11131 [pdf, other]

CLiMP: A Benchmark for Chinese Language Model Evaluation

Authors: Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, Katharina Kann

Abstract: Linguistically informed analyses of language models (LMs) contribute to the understanding and improvement of these models. Here, we introduce the corpus of Chinese linguistic minimal pairs (CLiMP), which can be used to investigate what knowledge Chinese LMs acquire. CLiMP consists of sets of 1,000 minimal pairs (MPs) for 16 syntactic contrasts in Mandarin, covering 9 major Mandarin linguistic phen… ▽ More Linguistically informed analyses of language models (LMs) contribute to the understanding and improvement of these models. Here, we introduce the corpus of Chinese linguistic minimal pairs (CLiMP), which can be used to investigate what knowledge Chinese LMs acquire. CLiMP consists of sets of 1,000 minimal pairs (MPs) for 16 syntactic contrasts in Mandarin, covering 9 major Mandarin linguistic phenomena. The MPs are semi-automatically generated, and human agreement with the labels in CLiMP is 95.8%. We evaluated 11 different LMs on CLiMP, covering n-grams, LSTMs, and Chinese BERT. We find that classifier-noun agreement and verb complement selection are the phenomena that models generally perform best at. However, models struggle the most with the ba construction, binding, and filler-gap dependencies. Overall, Chinese BERT achieves an 81.8% average accuracy, while the performances of LSTMs and 5-grams are only moderately above chance level. △ Less

Submitted 26 January, 2021; originally announced January 2021.

arXiv:2101.10565 [pdf, other]

Coloring the Black Box: What Synesthesia Tells Us about Character Embeddings

Authors: Katharina Kann, Mauro M. Monsalve-Mercado

Abstract: In contrast to their word- or sentence-level counterparts, character embeddings are still poorly understood. We aim at closing this gap with an in-depth study of English character embeddings. For this, we use resources from research on grapheme-color synesthesia -- a neuropsychological phenomenon where letters are associated with colors, which give us insight into which characters are similar for… ▽ More In contrast to their word- or sentence-level counterparts, character embeddings are still poorly understood. We aim at closing this gap with an in-depth study of English character embeddings. For this, we use resources from research on grapheme-color synesthesia -- a neuropsychological phenomenon where letters are associated with colors, which give us insight into which characters are similar for synesthetes and how characters are organized in color space. Comparing 10 different character embeddings, we ask: How similar are character embeddings to a synesthete's perception of characters? And how similar are character embeddings extracted from different models? We find that LSTMs agree with humans more than transformers. Comparing across tasks, grapheme-to-phoneme conversion results in the most human-like character embeddings. Finally, ELMo embeddings differ from both humans and other models. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: EACL 2021

arXiv:2010.02804 [pdf, other]

Tackling the Low-resource Challenge for Canonical Segmentation

Authors: Manuel Mager, Özlem Çetinoğlu, Katharina Kann

Abstract: Canonical morphological segmentation consists of dividing words into their standardized morphemes. Here, we are interested in approaches for the task when training data is limited. We compare model performance in a simulated low-resource setting for the high-resource languages German, English, and Indonesian to experiments on new datasets for the truly low-resource languages Popoluca and Tepehua.… ▽ More Canonical morphological segmentation consists of dividing words into their standardized morphemes. Here, we are interested in approaches for the task when training data is limited. We compare model performance in a simulated low-resource setting for the high-resource languages German, English, and Indonesian to experiments on new datasets for the truly low-resource languages Popoluca and Tepehua. We explore two new models for the task, borrowing from the closely related area of morphological generation: an LSTM pointer-generator and a sequence-to-sequence model with hard monotonic attention trained with imitation learning. We find that, in the low-resource setting, the novel approaches outperform existing ones on all languages by up to 11.4% accuracy. However, while accuracy in emulated low-resource scenarios is over 50% for all languages, for the truly low-resource languages Popoluca and Tepehua, our best model only obtains 37.4% and 28.4% accuracy, respectively. Thus, we conclude that canonical segmentation is still a challenging task for low-resource languages. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: Accepted to EMNLP 2020

arXiv:2010.02239 [pdf, other]

Acrostic Poem Generation

Authors: Rajat Agarwal, Katharina Kann

Abstract: We propose a new task in the area of computational creativity: acrostic poem generation in English. Acrostic poems are poems that contain a hidden message; typically, the first letter of each line spells out a word or short phrase. We define the task as a generation task with multiple constraints: given an input word, 1) the initial letters of each line should spell out the provided word, 2) the p… ▽ More We propose a new task in the area of computational creativity: acrostic poem generation in English. Acrostic poems are poems that contain a hidden message; typically, the first letter of each line spells out a word or short phrase. We define the task as a generation task with multiple constraints: given an input word, 1) the initial letters of each line should spell out the provided word, 2) the poem's semantics should also relate to it, and 3) the poem should conform to a rhyming scheme. We further provide a baseline model for the task, which consists of a conditional neural language model in combination with a neural rhyming model. Since no dedicated datasets for acrostic poem generation exist, we create training data for our task by first training a separate topic prediction model on a small set of topic-annotated poems and then predicting topics for additional poems. Our experiments show that the acrostic poems generated by our baseline are received well by humans and do not lose much quality due to the additional constraints. Last, we confirm that poems generated by our model are indeed closely related to the provided prompts, and that pretraining on Wikipedia can boost performance. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2006.11830 [pdf, ps, other]

The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2

Authors: Assaf Singer, Katharina Kann

Abstract: We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion. The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form. The latter requires generating entire paradigms for a set of given lemmas from r… ▽ More We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion. The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form. The latter requires generating entire paradigms for a set of given lemmas from raw text alone. We model morphological inflection as a sequence-to-sequence problem, where the input is the sequence of the lemma's characters with morphological tags, and the output is the sequence of the inflected form's characters. First, we apply a transformer model to the task. Second, as inflected forms share most characters with the lemma, we further propose a pointer-generator transformer model to allow easy copying of input characters. Our best performing system for Task 0 is placed 6th out of 23 systems. We further use our inflection systems as subcomponents of approaches for Task 2. Our best performing system for Task 2 is the 2nd best out of 7 submissions. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 8 pages, 2 figures

ACM Class: I.2.7; I.2.6

arXiv:2005.13756 [pdf, other]

The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

Authors: Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden

Abstract: In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to si… ▽ More In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to simulate a realistic use case, we first released data for 5 development languages. However, systems were officially evaluated on 9 surprise languages, which were only revealed a few days before the submission deadline. We provided a modular baseline system, which is a pipeline of 4 components. 3 teams submitted a total of 7 systems, but, surprisingly, none of the submitted systems was able to improve over the baseline on average over all 9 test languages. Only on 3 languages did a submitted system obtain the best results. This shows that unsupervised morphological paradigm completion is still largely unsolved. We present an analysis here, so that this shared task will ground further research on the topic. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: SIGMORPHON 2020

arXiv:2005.13455 [pdf, other]

Self-Training for Unsupervised Parsing with PRPN

Authors: Anhad Mohananey, Katharina Kann, Samuel R. Bowman

Abstract: Neural unsupervised parsing (UP) models learn to parse without access to syntactic annotations, while being optimized for another task like language modeling. In this work, we propose self-training for neural UP models: we leverage aggregated annotations predicted by copies of our model as supervision for future copies. To be able to use our model's predictions during training, we extend a recent… ▽ More Neural unsupervised parsing (UP) models learn to parse without access to syntactic annotations, while being optimized for another task like language modeling. In this work, we propose self-training for neural UP models: we leverage aggregated annotations predicted by copies of our model as supervision for future copies. To be able to use our model's predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a) such that it can be trained in a semi-supervised fashion. We then add examples with parses predicted by our model to our unlabeled UP training data. Our self-trained model outperforms the PRPN by 8.1% F1 and the previous state of the art by 1.6% F1. In addition, we show that our architecture can also be helpful for semi-supervised parsing in ultra-low-resource settings. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: Accepted for publication at the 16th International Conference on Parsing Technologies (IWPT), 2020

arXiv:2005.13013 [pdf, other]

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

Authors: Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Abstract: Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings. We investigate whether English intermediate-task training is still helpful on non-English target tasks. Using nine intermediate language-understanding tasks,… ▽ More Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings. We investigate whether English intermediate-task training is still helpful on non-English target tasks. Using nine intermediate language-understanding tasks, we evaluate intermediate-task transfer in a zero-shot cross-lingual setting on the XTREME benchmark. We see large improvements from intermediate training on the BUCC and Tatoeba sentence retrieval tasks and moderate improvements on question-answering target tasks. MNLI, SQuAD and HellaSwag achieve the best overall results as intermediate tasks, while multi-task intermediate offers small additional improvements. Using our best intermediate-task models for each target task, we obtain a 5.4 point improvement over XLM-R Large on the XTREME benchmark, setting the state of the art as of June 2020. We also investigate continuing multilingual MLM during intermediate-task training and using machine-translated intermediate-task data, but neither consistently outperforms simply performing English intermediate-task training. △ Less

Submitted 30 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

arXiv:2005.12411 [pdf, other]

The IMS-CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

Authors: Manuel Mager, Katharina Kann

Abstract: In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text. Our proposed system is a modified version… ▽ More In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text. Our proposed system is a modified version of the baseline introduced together with the task. In particular, we experiment with substituting the inflection generation component with an LSTM sequence-to-sequence model and an LSTM pointer-generator network. Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada. △ Less

Submitted 25 May, 2020; originally announced May 2020.

arXiv:2005.00970 [pdf, other]

Unsupervised Morphological Paradigm Completion

Authors: Huiming **, Liwei Cai, Yihui Peng, Chen Xia, Arya D. McCarthy, Katharina Kann

Abstract: We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or… ▽ More We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or to assist linguistic annotators. From a cognitive science perspective, this can shed light on how children acquire morphological knowledge. We further introduce a system for the task, which generates morphological paradigms via the following steps: (i) EDIT TREE retrieval, (ii) additional lemma retrieval, (iii) paradigm size discovery, and (iv) inflection generation. We perform an evaluation on 14 typologically diverse languages. Our system outperforms trivial baselines with ease and, for some languages, even obtains a higher accuracy than minimally supervised systems. △ Less

Submitted 20 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

Comments: Accepted by ACL 2020

arXiv:2005.00628 [pdf, other]

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

Authors: Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large… ▽ More While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings. △ Less

Submitted 9 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: ACL 2020

arXiv:2004.13305 [pdf, ps, other]

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Authors: Katharina Kann, Ophélie Lacroix, Anders Søgaard

Abstract: Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e.g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource lan… ▽ More Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e.g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: AAAI 2020

arXiv:2004.13304 [pdf, other]

Learning to Learn Morphological Inflection for Resource-Poor Languages

Authors: Katharina Kann, Samuel R. Bowman, Kyunghyun Cho

Abstract: We propose to cast the task of morphological inflection - map** a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments… ▽ More We propose to cast the task of morphological inflection - map** a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines. In particular, it obtains a 31.7% higher absolute accuracy than a previously proposed cross-lingual transfer model and outperforms the previous state of the art by 1.7% absolute accuracy on average over languages. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: AAAI 2020

arXiv:1910.09729 [pdf, other]

Grammatical Gender, Neo-Whorfianism, and Word Embeddings: A Data-Driven Approach to Linguistic Relativity

Authors: Katharina Kann

Abstract: The relation between language and thought has occupied linguists for at least a century. Neo-Whorfianism, a weak version of the controversial Sapir-Whorf hypothesis, holds that our thoughts are subtly influenced by the grammatical structures of our native language. One area of investigation in this vein focuses on how the grammatical gender of nouns affects the way we perceive the corresponding ob… ▽ More The relation between language and thought has occupied linguists for at least a century. Neo-Whorfianism, a weak version of the controversial Sapir-Whorf hypothesis, holds that our thoughts are subtly influenced by the grammatical structures of our native language. One area of investigation in this vein focuses on how the grammatical gender of nouns affects the way we perceive the corresponding objects. For instance, does the fact that key is masculine in German (der Schlüssel), but feminine in Spanish (la llave) change the speakers' views of those objects? Psycholinguistic evidence presented by Boroditsky et al. (2003, §4) suggested the answer might be yes: When asked to produce adjectives that best described a key, German and Spanish speakers named more stereotypically masculine and feminine ones, respectively. However, recent attempts to replicate those experiments have failed (Mickan et al., 2014). In this work, we offer a computational analogue of Boroditsky et al. (2003, §4)'s experimental design on 9 languages, finding evidence against neo-Whorfianism. △ Less

Submitted 21 October, 2019; originally announced October 2019.

arXiv:1910.05456 [pdf, ps, other]

Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge

Authors: Katharina Kann

Abstract: How does knowledge of one language's morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclus… ▽ More How does knowledge of one language's morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclusions: (i) if source and target language are closely related, acquisition of the target language's inflectional morphology constitutes an easier task for the model; (ii) knowledge of a prefixing (resp. suffixing) language makes acquisition of a suffixing (resp. prefixing) language's morphology more challenging; and (iii) surprisingly, a source language which exhibits an agglutinative morphology simplifies learning of a second language's inflectional morphology, independent of their relatedness. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: SCiL 2020

arXiv:1909.01522 [pdf, other]

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

Authors: Katharina Kann, Kyunghyun Cho, Samuel R. Bowman

Abstract: Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stop** in the low-resource sett… ▽ More Development sets are impractical to obtain for real low-resource languages, since using all available data for training is often more effective. However, development sets are widely used in research papers that purport to deal with low-resource natural language processing (NLP). Here, we aim to answer the following questions: Does using a development set for early stop** in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages? And does it lead to overestimation or underestimation of performance? We repeat multiple experiments from recent work on neural models for low-resource NLP and compare results for models obtained by training with and without development sets. On average over languages, absolute accuracy differs by up to 1.4%. However, for some languages and tasks, differences are as big as 18.0% accuracy. Our results highlight the importance of realistic experimental setups in the publication of low-resource NLP research results. △ Less

Submitted 14 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: EMNLP 2019

arXiv:1908.06136 [pdf, other]

Transductive Auxiliary Task Self-Training for Neural Multi-Task Models

Authors: Johannes Bjerva, Katharina Kann, Isabelle Augenstein

Abstract: Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data. Drawing heavily on ideas from those two approaches, we suggest transductive auxiliary task self-training: training a multi-task model on (i) a combination of main and auxiliary task training data, and (ii) test instances with auxiliary task labels whic… ▽ More Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data. Drawing heavily on ideas from those two approaches, we suggest transductive auxiliary task self-training: training a multi-task model on (i) a combination of main and auxiliary task training data, and (ii) test instances with auxiliary task labels which a single-task version of the model has previously generated. We perform extensive experiments on 86 combinations of languages and tasks. Our results are that, on average, transductive auxiliary task self-training improves absolute accuracy by up to 9.56% over the pure multi-task model for dependency relation tagging and by up to 13.03% for semantic tagging. △ Less

Submitted 22 September, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: Camera ready version, to appear at DeepLo 2019 (EMNLP workshop)

arXiv:1906.03608 [pdf, other]

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings

Authors: Yadollah Yaghoobzadeh, Katharina Kann, Timothy J. Hazen, Eneko Agirre, Hinrich Schütze

Abstract: Word embeddings typically represent different meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia annotations and word senses, where word senses fro… ▽ More Word embeddings typically represent different meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia annotations and word senses, where word senses from different words are related by semantic classes. This is the basis for novel diagnostic tests for an embedding's content: we probe word embeddings for semantic classes and analyze the embedding space by classifying embeddings into semantic classes. Our main findings are: (i) Information about a sense is generally represented well in a single-vector embedding - if the sense is frequent. (ii) A classifier can accurately predict whether a word is single-sense or multi-sense, based only on its embedding. (iii) Although rare senses are not well represented in single-vector embeddings, this does not have negative impact on an NLP application whose performance depends on frequent senses. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: 14 pages, Accepted at ACL 2019

arXiv:1904.01989 [pdf, other]

Subword-Level Language Identification for Intra-Word Code-Switching

Authors: Manuel Mager, Özlem Çetinoğlu, Katharina Kann

Abstract: Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token. However, if at least one language is morphologically rich, a large number of words can be composed of morphemes from more than one language (intra-word CS). In this paper, we extend the… ▽ More Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token. However, if at least one language is morphologically rich, a large number of words can be composed of morphemes from more than one language (intra-word CS). In this paper, we extend the language identification task to the subword-level, such that it includes splitting mixed words while tagging each part with a language ID. We further propose a model for this task, which is based on a segmental recurrent neural network. In experiments on a new Spanish--Wixarika dataset and on an adapted German--Turkish dataset, our proposed model performs slightly better than or roughly on par with our best baseline, respectively. Considering only mixed words, however, it strongly outperforms all baselines. △ Less

Submitted 3 April, 2019; originally announced April 2019.

Comments: NAACL-HLT 2019

arXiv:1811.10773 [pdf, other]

Verb Argument Structure Alternations in Word and Sentence Embeddings

Authors: Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman

Abstract: Verbs occur in different syntactic environments, or frames. We investigate whether artificial neural networks encode grammatical distinctions necessary for inferring the idiosyncratic frame-selectional properties of verbs. We introduce five datasets, collectively called FAVA, containing in aggregate nearly 10k sentences labeled for grammatical acceptability, illustrating different verbal argument… ▽ More Verbs occur in different syntactic environments, or frames. We investigate whether artificial neural networks encode grammatical distinctions necessary for inferring the idiosyncratic frame-selectional properties of verbs. We introduce five datasets, collectively called FAVA, containing in aggregate nearly 10k sentences labeled for grammatical acceptability, illustrating different verbal argument structure alternations. We then test whether models can distinguish acceptable English verb-frame combinations from unacceptable ones using a sentence embedding alone. For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings. Our models perform reliable classifications for some verbal alternations but not others, suggesting that while these representations do encode fine-grained lexical information, it is incomplete or can be hard to extract. Further, differences between the word- and sentence-level models show that some information present in word embeddings is not passed on to the down-stream sentence embeddings. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted to SCiL 2019

arXiv:1810.07125 [pdf, other]

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

Abstract: The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a… ▽ More The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task. This second task featured seven languages. Task 1 received 27 submissions and task 2 received 6 submissions. Both tasks featured a low, medium, and high data condition. Nearly all submissions featured a neural component and built on highly-ranked systems from the earlier 2017 shared task. In the inflection task (task 1), 41 of the 52 languages present in last year's inflection task showed improvement by the best systems in the low-resource setting. The cloze task (task 2) proved to be difficult, and few submissions managed to consistently improve upon both a simple neural baseline system and a lemma-repeating baseline. △ Less

Submitted 25 February, 2020; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: CoNLL 2018. arXiv admin note: text overlap with arXiv:1706.09031

arXiv:1809.08733 [pdf, other]

Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting

Authors: Katharina Kann, Hinrich Schütze

Abstract: Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are… ▽ More Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are able to capture relationships between paradigm cells, but are tied to the idiosyncracies of the training set. Paradigm transduction mitigates this problem by exploiting the input subset of inflected forms at test time. (ii) Source selection with high precision (SHIP): Multi-source models which learn to automatically select one or multiple sources to predict a target inflection do not perform well in the minimal-resource setting. SHIP is an alternative to identify a reliable source if training data is limited. On a 52-language benchmark dataset, we outperform the previous state of the art by up to 9.71% absolute accuracy. △ Less

Submitted 9 May, 2019; v1 submitted 23 September, 2018; originally announced September 2018.

Comments: EMNLP 2018

arXiv:1809.08731 [pdf, ps, other]

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

Authors: Katharina Kann, Sascha Rothe, Katja Filippova

Abstract: Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though wor… ▽ More Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own. △ Less

Submitted 23 September, 2018; originally announced September 2018.

Comments: Accepted to CoNLL 2018

arXiv:1807.07186 [pdf, other]

Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Ty**

Authors: Yadollah Yaghoobzadeh, Katharina Kann, Hinrich Schütze

Abstract: Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embed… ▽ More Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name ty**: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence context △ Less

Submitted 18 July, 2018; originally announced July 2018.

Comments: 6 pages, The 3rd Workshop on Representation Learning for NLP (RepL4NLP @ ACL2018)

arXiv:1807.00286 [pdf, ps, other]

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Authors: Manuel Mager, Elisabeth Mager, Alfonso Medina-Urrea, Ivan Meza, Katharina Kann

Abstract: Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we stu… ▽ More Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process. △ Less

Submitted 1 July, 2018; originally announced July 2018.

Comments: To appear in "All Together Now? Computational Modeling of Polysynthetic Languages" Workshop, at COLING 2018

arXiv:1804.06024 [pdf, ps, other]

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

Authors: Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze

Abstract: Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performan… ▽ More Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: Long Paper, 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

arXiv:1705.06106 [pdf, other]

Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models

Authors: Katharina Kann, Hinrich Schütze

Abstract: We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use… ▽ More We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.9% improvement over state-of-the-art baselines for 8 different languages. △ Less

Submitted 21 July, 2017; v1 submitted 17 May, 2017; originally announced May 2017.

Comments: Accepted at SCLeM 2017

arXiv:1704.00052 [pdf, other]

One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

Authors: Katharina Kann, Ryan Cotterell, Hinrich Schütze

Abstract: We present a novel cross-lingual transfer method for paradigm completion, the task of map** a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up… ▽ More We present a novel cross-lingual transfer method for paradigm completion, the task of map** a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge. △ Less

Submitted 31 March, 2017; originally announced April 2017.

Comments: Accepted at ACL 2017

arXiv:1702.01923 [pdf, other]

Comparative Study of CNN and RNN for Natural Language Processing

Authors: Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze

Abstract: Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tas… ▽ More Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection. △ Less

Submitted 7 February, 2017; originally announced February 2017.

Comments: 7 pages, 11 figures

arXiv:1612.06027 [pdf, other]

Neural Multi-Source Morphological Reinflection

Authors: Katharina Kann, Ryan Cotterell, Hinrich Schütze

Abstract: We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version. The input consists of (i) a target tag and (ii) multiple pairs of source form and source tag for a lemma. The motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e.g., different stems.… ▽ More We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version. The input consists of (i) a target tag and (ii) multiple pairs of source form and source tag for a lemma. The motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e.g., different stems. We further present a novel extension to the encoder- decoder recurrent neural architecture, consisting of multiple encoders, to better solve the task. We show that our new architecture outperforms single-source reinflection models and publish our dataset for multi-source morphological reinflection to facilitate future research. △ Less

Submitted 22 January, 2017; v1 submitted 18 December, 2016; originally announced December 2016.

Comments: Accepted at EACL 2017. Camera Ready Version

arXiv:1606.00589 [pdf, other]

Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

Authors: Katharina Kann, Hinrich Schütze

Abstract: Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low… ▽ More Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low-resource languages. We further present a new automatic correction method for the outputs based on edit trees. △ Less

Submitted 2 June, 2016; originally announced June 2016.

Comments: Accepted at ACL 2016

Showing 1–50 of 50 results for author: Kann, K