JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Armstrong, Ruth-Ann; Hewitt, John; Manning, Christopher

Computer Science > Computation and Language

arXiv:2212.03419 (cs)

[Submitted on 7 Dec 2022]

Title:JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Authors:Ruth-Ann Armstrong, John Hewitt, Christopher Manning

View PDF

Abstract:JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.

Comments:	14 pages, 3 figures, Findings of EMNLP 2022
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.7
Cite as:	arXiv:2212.03419 [cs.CL]
	(or arXiv:2212.03419v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.03419

Submission history

From: Ruth-Ann Armstrong [view email]
[v1] Wed, 7 Dec 2022 03:07:02 UTC (226 KB)

Computer Science > Computation and Language

Title:JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators