Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Nie, Ercong; Liang, Sheng; Schmid, Helmut; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2212.09651 (cs)

[Submitted on 19 Dec 2022 (v1), last revised 10 Jul 2023 (this version, v4)]

Title:Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Authors:Ercong Nie, Sheng Liang, Helmut Schmid, Hinrich Schütze

View PDF

Abstract:Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

Comments:	Accepted to Findings of ACL 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.09651 [cs.CL]
	(or arXiv:2212.09651v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.09651

Submission history

From: Ercong Nie [view email]
[v1] Mon, 19 Dec 2022 17:29:37 UTC (11,285 KB)
[v2] Tue, 2 May 2023 19:32:20 UTC (11,956 KB)
[v3] Wed, 31 May 2023 08:14:32 UTC (11,955 KB)
[v4] Mon, 10 Jul 2023 22:27:15 UTC (11,957 KB)

Computer Science > Computation and Language

Title:Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators