Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration

Nie, Ercong; Schmid, Helmut; Schütze, Hinrich

Computer Science > Computation and Language

arXiv:2310.05069 (cs)

[Submitted on 8 Oct 2023 (v1), last revised 19 Oct 2023 (this version, v2)]

Title:Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration

Authors:Ercong Nie, Helmut Schmid, Hinrich Schütze

View PDF

Abstract:Pretrained multilingual encoder models can directly perform zero-shot multilingual tasks or linguistic probing by reformulating the input examples into cloze-style prompts. This is accomplished by predicting the probabilities of the label words at the masked token position, without requiring any updates to the model parameters. However, the performance of this method is limited by the model's bias toward predicting label words which frequently occurred during the pretraining. These words typically receive high probabilities. To address this issue, we combine the models with calibration techniques which modify the probabilities of label words predicted by the models. We first validate the effectiveness of a proposed simple calibration method together with other existing techniques on monolingual encoders in both zero- and few-shot scenarios. We subsequently employ these calibration techniques on multilingual encoders, resulting in substantial performance improvements across a wide range of tasks.

Comments:	Accepted to Findings of EMNLP 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.05069 [cs.CL]
	(or arXiv:2310.05069v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05069

Submission history

From: Ercong Nie [view email]
[v1] Sun, 8 Oct 2023 08:31:05 UTC (10,321 KB)
[v2] Thu, 19 Oct 2023 15:58:05 UTC (9,310 KB)

Computer Science > Computation and Language

Title:Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot Performance via Probability Calibration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators