Skip to main content

Showing 1–13 of 13 results for author: Marchisio, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2407.02552  [pdf, other

    cs.CL cs.AI cs.LG

    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    Authors: John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker

    Abstract: Preference optimization techniques have become a standard final stage for training state-of-art large language models (LLMs). However, despite widespread adoption, the vast majority of work to-date has focused on first-class citizen languages like English and Chinese. This captures a small fraction of the languages in the world, but also makes it unclear which aspects of current state-of-the-art r… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2406.20052  [pdf, other

    cs.CL

    Understanding and Mitigating Language Confusion in LLMs

    Authors: Kelly Marchisio, Wei-Yin Ko, Alexandre Bérard, Théo Dehaze, Sebastian Ruder

    Abstract: We investigate a surprising limitation of LLMs: their inability to consistently generate text in a user's desired language. We create the Language Confusion Benchmark (LCB) to evaluate such failures, covering 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We evaluate a range of LLMs on monolingual and cross-lingual generation reflecting practic… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  4. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2307.01163  [pdf, other

    cs.CL cs.LG cs.NE

    Improving Language Plasticity via Pretraining with Active Forgetting

    Authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

    Abstract: Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data an… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 Final Version

  6. arXiv:2301.07209  [pdf, other

    cs.CL cs.LG

    Learning a Formality-Aware Japanese Sentence Representation

    Authors: Henry Li Xinyuan, Ray Lee, Jerry Chen, Kelly Marchisio

    Abstract: While the way intermediate representations are generated in encoder-decoder sequence-to-sequence models typically allow them to preserve the semantics of the input sentence, input features such as formality might be left out. On the other hand, downstream tasks such as translation would benefit from working with a sentence representation that preserves formality in addition to semantics, so as to… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  7. arXiv:2212.10503  [pdf, other

    cs.CL cs.LG

    Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

    Authors: Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

    Abstract: Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while kee** the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. We propose mini-model adaptation… ▽ More

    Submitted 4 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Findings of ACL 2023 Camera Ready

  8. arXiv:2210.14378  [pdf, other

    cs.CL cs.LG

    Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

    Authors: Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Camera-Ready

  9. arXiv:2210.05098  [pdf, other

    cs.CL cs.LG

    IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

    Authors: Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn

    Abstract: The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual map**: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into t… ▽ More

    Submitted 4 July, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Updated EMNLP2022 Camera Ready (citation correction, removed references to dimensionality reduction [was not used here].)

  10. arXiv:2109.12640  [pdf, other

    cs.CL

    An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

    Authors: Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exp… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: EMNLP Findings 2021 Camera-Ready

  11. arXiv:2106.15818  [pdf, other

    cs.CL

    On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation

    Authors: Kelly Marchisio, Markus Freitag, David Grangier

    Abstract: Modern unsupervised machine translation (MT) systems reach reasonable translation quality under clean and controlled data conditions. As the performance gap between supervised and unsupervised MT narrows, it is interesting to ask whether the different training methods result in systematically different output beyond what is visible via quality metrics like adequacy or BLEU. We compare translations… ▽ More

    Submitted 13 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: NAACL 2022 Camera-Ready. Tiny text changes to deal with compiler differences between arxiv and Overleaf

  12. arXiv:2104.08721  [pdf, other

    cs.CL

    Embedding-Enhanced Giza++: Improving Alignment in Low- and High- Resource Scenarios Using Embedding Space Geometry

    Authors: Kelly Marchisio, Conghao Xiong, Philipp Koehn

    Abstract: A popular natural language processing task decades ago, word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models. New methods that outperform GIZA++ primarily rely on large machine translation models, massively multilingual language models, or supervision from GIZA++ alignments itself. We introduce Embedding-Enhanced GIZA++, and outperfor… ▽ More

    Submitted 10 October, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: AMTA2022 Camera Ready

  13. arXiv:2004.05516  [pdf, other

    cs.CL

    When Does Unsupervised Machine Translation Work?

    Authors: Kelly Marchisio, Kevin Duh, Philipp Koehn

    Abstract: Despite the reported success of unsupervised machine translation (MT), the field has yet to examine the conditions under which these methods succeed, and where they fail. We conduct an extensive empirical evaluation of unsupervised MT using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages. We find that performance rapidly deteriorates when sourc… ▽ More

    Submitted 18 November, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: WMT20 Camera Ready