MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

Gladkoff, Serge; Han, Lifeng; Erofeev, Gleb; Sorokina, Irina; Nenadic, Goran

Computer Science > Computation and Language

arXiv:2308.00158 (cs)

[Submitted on 31 Jul 2023 (v1), last revised 21 Jun 2024 (this version, v6)]

Title:MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

Authors:Serge Gladkoff, Lifeng Han, Gleb Erofeev, Irina Sorokina, Goran Nenadic

View PDF HTML (experimental)

Abstract:Translation Quality Evaluation (TQE) is an essential step of the modern translation production process. TQE is critical in assessing both machine translation (MT) and human translation (HT) quality without reference translations. The ability to evaluate or even simply estimate the quality of translation automatically may open significant efficiency gains through process optimisation. This work examines whether the state-of-the-art large language models (LLMs) can be used for this purpose. We take OpenAI models as the best state-of-the-art technology and approach TQE as a binary classification task. On eight language pairs including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese, our experimental results show that fine-tuned gpt3.5 can demonstrate good performance on translation quality prediction tasks, i.e. whether the translation needs to be edited. Another finding is that simply increasing the sizes of LLMs does not lead to apparent better performances on this task by comparing the performance of three different versions of OpenAI models: curie, davinci, and gpt3.5 with 13B, 175B, and 175B parameters, respectively.

Comments:	Accepted by EAMT2024: The 25th Annual Conference of The European Association for Machine Translation
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2308.00158 [cs.CL]
	(or arXiv:2308.00158v6 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.00158

Submission history

From: Lifeng Han Dr [view email]
[v1] Mon, 31 Jul 2023 21:13:30 UTC (248 KB)
[v2] Thu, 10 Aug 2023 23:20:03 UTC (828 KB)
[v3] Mon, 21 Aug 2023 14:23:14 UTC (830 KB)
[v4] Mon, 6 Nov 2023 15:52:58 UTC (1,333 KB)
[v5] Wed, 8 Nov 2023 17:25:34 UTC (1,333 KB)
[v6] Fri, 21 Jun 2024 17:34:47 UTC (8,995 KB)

Computer Science > Computation and Language

Title:MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators