Search | arXiv e-print repository

arXiv:2405.20089 [pdf, other]

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

Authors: David Stap, Eva Hasler, Bill Byrne, Christof Monz, Ke Tran

Abstract: Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an… ▽ More Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an extensive translation evaluation on the LLaMA and Falcon family of models with model size ranging from 7 billion up to 65 billion parameters. Our results show that while fine-tuning improves the general translation quality of LLMs, several abilities degrade. In particular, we observe a decline in the ability to perform formality steering, to produce technical translations through few-shot examples, and to perform document-level translation. On the other hand, we observe that the model produces less literal translations after fine-tuning on parallel data. We show that by including monolingual data as part of the fine-tuning data we can maintain the abilities while simultaneously enhancing overall translation quality. Our findings emphasize the need for fine-tuning strategies that preserve the benefits of LLMs for machine translation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted to ACL 2024 (long, main)

arXiv:2404.11288 [pdf, other]

A Preference-driven Paradigm for Enhanced Translation with Large Language Models

Authors: Dawei Zhu, Sony Trenous, Xiaoyu Shen, Dietrich Klakow, Bill Byrne, Eva Hasler

Abstract: Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine-tuning (SFT) using only a small amount of parallel data. However, SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. Hence, the assistance from SFT often reaches a platea… ▽ More Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine-tuning (SFT) using only a small amount of parallel data. However, SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. Hence, the assistance from SFT often reaches a plateau once the LLMs have achieved a certain level of translation capability, and further increasing the size of parallel data does not provide additional benefits. To overcome this plateau associated with imitation-based SFT, we propose a preference-based approach built upon the Plackett-Luce model. The objective is to steer LLMs towards a more nuanced understanding of translation preferences from a holistic view, while also being more resilient in the absence of gold translations. We further build a dataset named MAPLE to verify the effectiveness of our approach, which includes multiple translations of varying quality for each source sentence. Extensive experiments demonstrate the superiority of our approach in "breaking the plateau" across diverse LLMs and test settings. Our in-depth analysis underscores the pivotal role of diverse translations and accurate preference scores in the success of our approach. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 (long, main)

arXiv:2312.00536 [pdf, other]

Trained MT Metrics Learn to Cope with Machine-translated References

Authors: Jannis Vamvas, Tobias Domhan, Sony Trenous, Rico Sennrich, Eva Hasler

Abstract: Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood. In this paper, we perform a controlled experiment and compare a baseline metric that has not been trained on human evaluations (Prism) to a trained version of the same metric (Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to machine-transla… ▽ More Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood. In this paper, we perform a controlled experiment and compare a baseline metric that has not been trained on human evaluations (Prism) to a trained version of the same metric (Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to machine-translated references, which are a notorious problem in MT evaluation. This suggests that the effects of metric training go beyond the intended effect of improving overall correlation with human judgments. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: WMT 2023

arXiv:2302.08920 [pdf, other]

A tale of two tails: 130 years of growth-at-risk

Authors: Martin Gächter, Elias Hasler, Florian Huber

Abstract: We extend the existing growth-at-risk (GaR) literature by examining a long time period of 130 years in a time-varying parameter regression model. We identify several important insights for policymakers. First, both the level as well as the determinants of GaR vary significantly over time. Second, the stability of upside risks to GDP growth reported in earlier research is specific to the period kno… ▽ More We extend the existing growth-at-risk (GaR) literature by examining a long time period of 130 years in a time-varying parameter regression model. We identify several important insights for policymakers. First, both the level as well as the determinants of GaR vary significantly over time. Second, the stability of upside risks to GDP growth reported in earlier research is specific to the period known as the Great Moderation, with the distribution of risks being more balanced before the 1970s. Third, the distribution of GDP growth has significantly narrowed since the end of the Bretton Woods system. Fourth, financial stress is always linked to higher downside risks, but it does not affect upside risks. Finally, other risk indicators, such as credit growth and house prices, not only drive downside risks, but also contribute to increased upside risks during boom periods. In this context, the paper also adds to the financial cycle literature by completing the picture of drivers (and risks) for both booms and recessions over time. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2210.13281 [pdf, other]

Analyzing the Use of Influence Functions for Instance-Specific Data Filtering in Neural Machine Translation

Authors: Tsz Kin Lam, Eva Hasler, Felix Hieber

Abstract: Customer feedback can be an important signal for improving commercial machine translation systems. One solution for fixing specific translation errors is to remove the related erroneous training instances followed by re-training of the machine translation system, which we refer to as instance-specific data filtering. Influence functions (IF) have been shown to be effective in finding such relevant… ▽ More Customer feedback can be an important signal for improving commercial machine translation systems. One solution for fixing specific translation errors is to remove the related erroneous training instances followed by re-training of the machine translation system, which we refer to as instance-specific data filtering. Influence functions (IF) have been shown to be effective in finding such relevant training examples for classification tasks such as image classification, toxic speech detection and entailment task. Given a probing instance, IF find influential training examples by measuring the similarity of the probing instance with a set of training examples in gradient space. In this work, we examine the use of influence functions for Neural Machine Translation (NMT). We propose two effective extensions to a state of the art influence function and demonstrate on the sub-problem of copied training examples that IF can be applied more generally than handcrafted regular expressions. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted at WMT 2022

arXiv:2210.04545 [pdf, other]

Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

Authors: Christos Baziotis, Prashant Mathur, Eva Hasler

Abstract: A major open problem in neural machine translation (NMT) is the translation of idiomatic expressions, such as "under the weather". The meaning of these expressions is not composed by the meaning of their constituent words, and NMT models tend to translate them literally (i.e., word-by-word), which leads to confusing and nonsensical translations. Research on idioms in NMT is limited and obstructed… ▽ More A major open problem in neural machine translation (NMT) is the translation of idiomatic expressions, such as "under the weather". The meaning of these expressions is not composed by the meaning of their constituent words, and NMT models tend to translate them literally (i.e., word-by-word), which leads to confusing and nonsensical translations. Research on idioms in NMT is limited and obstructed by the absence of automatic methods for quantifying these errors. In this work, first, we propose a novel metric for automatically measuring the frequency of literal translation errors without human involvement. Equipped with this metric, we present controlled translation experiments with models trained in different conditions (with/without the test-set idioms) and across a wide range of (global and targeted) metrics and test sets. We explore the role of monolingual pretraining and find that it yields substantial targeted improvements, even without observing any translation examples of the test-set idioms. In our analysis, we probe the role of idiom context. We find that the randomly initialized models are more local or "myopic" as they are relatively unaffected by variations of the idiom context, unlike the pretrained ones. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2205.06618 [pdf, other]

The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation

Authors: Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne, Felix Hieber

Abstract: Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with re… ▽ More Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: NAACL 2022

arXiv:1805.03750 [pdf, ps, other]

Neural Machine Translation Decoding with Terminology Constraints

Authors: Eva Hasler, Adrià De Gispert, Gonzalo Iglesias, Bill Byrne

Abstract: Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology constraints remains an open problem. We describe our approach to constrained neural decoding based on finite-state machines and multi-stack decoding which supports target-side constraints as well as constraints with correspondi… ▽ More Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology constraints remains an open problem. We describe our approach to constrained neural decoding based on finite-state machines and multi-stack decoding which supports target-side constraints as well as constraints with corresponding aligned input text spans. We demonstrate the performance of our framework on multiple translation tasks and motivate the need for constrained decoding with attentions as a means of reducing misplacement and duplication when translating user constraints. △ Less

Submitted 9 May, 2018; originally announced May 2018.

Comments: Proceedings of NAACL-HLT 2018

arXiv:1804.11324 [pdf, other]

Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment

Authors: Gonzalo Iglesias, William Tambellini, Adrià De Gispert, Eva Hasler, Bill Byrne

Abstract: We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed. We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed. △ Less

Submitted 30 April, 2018; originally announced April 2018.

Comments: Proceedings of NAACL-HLT 2018

arXiv:1708.01809 [pdf, other]

A Comparison of Neural Models for Word Ordering

Authors: Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill Byrne

Abstract: We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model… ▽ More We compare several language models for the word-ordering task and propose a new bag-to-sequence neural model based on attention-based sequence-to-sequence models. We evaluate the model on a large German WMT data set where it significantly outperforms existing models. We also describe a novel search strategy for LM-based word ordering and report results on the English Penn Treebank. Our best model setup outperforms prior work both in terms of speed and quality. △ Less

Submitted 5 August, 2017; originally announced August 2017.

Comments: Accepted for publication at INLG 2017

arXiv:1707.06885 [pdf, other]

SGNMT -- A Flexible NMT Decoding Platform for Quick Prototy** of New Models and Search Strategies

Authors: Felix Stahlberg, Eva Hasler, Danielle Saunders, Bill Byrne

Abstract: This paper introduces SGNMT, our experimental platform for machine translation research. SGNMT provides a generic interface to neural and symbolic scoring modules (predictors) with left-to-right semantic such as translation models like NMT, language models, translation lattices, $n$-best lists or other kinds of scores and constraints. Predictors can be combined with other predictors to form comple… ▽ More This paper introduces SGNMT, our experimental platform for machine translation research. SGNMT provides a generic interface to neural and symbolic scoring modules (predictors) with left-to-right semantic such as translation models like NMT, language models, translation lattices, $n$-best lists or other kinds of scores and constraints. Predictors can be combined with other predictors to form complex decoding tasks. SGNMT implements a number of search strategies for traversing the space spanned by the predictors which are appropriate for different predictor constellations. Adding new predictors or decoding strategies is particularly easy, making it a very efficient tool for prototy** new research ideas. SGNMT is actively being used by students in the MPhil program in Machine Learning, Speech and Language Technology at the University of Cambridge for course work and theses, as well as for most of the research work in our group. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: Accepted as EMNLP 2017 demo paper

arXiv:1612.03791 [pdf, other]

Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

Authors: Felix Stahlberg, Adrià de Gispert, Eva Hasler, Bill Byrne

Abstract: We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than $n$-best list or lattice rescoring as the neur… ▽ More We present a novel scheme to combine neural machine translation (NMT) with traditional statistical machine translation (SMT). Our approach borrows ideas from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is combined with the Bayes-risk of the translation according the SMT lattice. This makes our approach much more flexible than $n$-best list or lattice rescoring as the neural decoder is not restricted to the SMT search space. We show an efficient and simple way to integrate risk estimation into the NMT decoder which is suitable for word-level as well as subword-unit-level NMT. We test our method on English-German and Japanese-English and report significant gains over lattice rescoring on several data sets for both single and ensembled NMT. The MBR decoder produces entirely new hypotheses far beyond simply rescoring the SMT search space or fixing UNKs in the NMT output. △ Less

Submitted 13 February, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: EACL2017 short paper

arXiv:1606.04963 [pdf, other]

The Edit Distance Transducer in Action: The University of Cambridge English-German System at WMT16

Authors: Felix Stahlberg, Eva Hasler, Bill Byrne

Abstract: This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore,… ▽ More This paper presents the University of Cambridge submission to WMT16. Motivated by the complementary nature of syntactical machine translation and neural machine translation (NMT), we exploit the synergies of Hiero and NMT in different combination schemes. Starting out with a simple neural lattice rescoring approach, we show that the Hiero lattices are often too narrow for NMT ensembles. Therefore, instead of a hard restriction of the NMT search space to the lattice, we propose to loosely couple NMT and Hiero by composition with a modified version of the edit distance transducer. The loose combination outperforms lattice rescoring, especially when using multiple NMT systems in an ensemble. △ Less

Submitted 15 June, 2016; originally announced June 2016.

arXiv:1605.04569 [pdf, other]

doi 10.18653/v1/P16-2049

Syntactically Guided Neural Machine Translation

Authors: Felix Stahlberg, Eva Hasler, Aurelien Waite, Bill Byrne

Abstract: We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full n-gram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over… ▽ More We investigate the use of hierarchical phrase-based SMT lattices in end-to-end neural machine translation (NMT). Weight pushing transforms the Hiero scores for complete translation hypotheses, with the full translation grammar score and full n-gram language model score, into posteriors compatible with NMT predictive probabilities. With a slightly modified NMT beam-search decoder we find gains over both Hiero and NMT decoding alone, with practical advantages in extending NMT to very large input and output vocabularies. △ Less

Submitted 19 May, 2016; v1 submitted 15 May, 2016; originally announced May 2016.

Comments: ACL 2016

arXiv:1510.04709 [pdf, ps, other]

Multilingual Image Description with Neural Sequence Models

Authors: Desmond Elliott, Stella Frank, Eva Hasler

Abstract: In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and… ▽ More In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline. △ Less

Submitted 18 November, 2015; v1 submitted 15 October, 2015; originally announced October 2015.

Comments: Under review as a conference paper at ICLR 2016

Showing 1–15 of 15 results for author: Hasler, E