Skip to main content

Showing 1–19 of 19 results for author: Voita, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07004  [pdf, other

    cs.CL

    LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models

    Authors: Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Elena Voita

    Abstract: We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Differently from previously existing tools that focus on isolated parts of the decision-making process, our framework is designed to make the entire prediction process transparent, and allows tracing back model behavior from the top-layer represe… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  2. arXiv:2404.05411  [pdf, other

    cs.CL

    Know When To Stop: A Study of Semantic Drift in Text Generation

    Authors: Ava Spataru, Eric Hambro, Elena Voita, Nicola Cancedda

    Abstract: In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2403.00824  [pdf, other

    cs.CL cs.AI

    Information Flow Routes: Automatically Interpreting Language Models at Scale

    Authors: Javier Ferrando, Elena Voita

    Abstract: Information flows by routes inside the network via mechanisms implemented in the model. These routes can be represented as graphs where nodes correspond to token representations and edges to operations inside the network. We automatically build these graphs in a top-down manner, for each prediction leaving only the most important nodes and edges. In contrast to the existing workflows relying on ac… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

  4. arXiv:2309.04827  [pdf, other

    cs.CL

    Neurons in Large Language Models: Dead, N-gram, Positional

    Authors: Elena Voita, Javier Ferrando, Christoforos Nalmpantis

    Abstract: We analyze a family of large language models in such a lightweight manner that can be done on a single GPU. Specifically, we focus on the OPT family of models ranging from 125m to 66b parameters and rely only on whether an FFN neuron is activated or not. First, we find that the early part of the network is sparse and represents many discrete features. Here, many neurons (more than 70% in some laye… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  5. arXiv:2305.11746  [pdf, other

    cs.CL

    HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation

    Authors: David Dale, Elena Voita, Janice Lam, Prangthip Hansanti, Christophe Ropers, Elahe Kalbassi, Cynthia Gao, Loïc Barrault, Marta R. Costa-jussà

    Abstract: Hallucinations in machine translation are translations that contain information completely unrelated to the input. Omissions are translations that do not include some of the input information. While both cases tend to be catastrophic errors undermining user trust, annotated data with these types of pathologies is extremely scarce and is limited to a few high-resource languages. In this work, we re… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    ACM Class: I.2.7

    Journal ref: EMNLP 2023

  6. arXiv:2212.08597  [pdf, other

    cs.CL

    Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

    Authors: David Dale, Elena Voita, Loïc Barrault, Marta R. Costa-jussà

    Abstract: While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model ca… ▽ More

    Submitted 20 December, 2022; v1 submitted 16 December, 2022; originally announced December 2022.

    ACM Class: I.2.7

  7. arXiv:2208.05309  [pdf, other

    cs.CL cs.LG

    Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

    Authors: Nuno M. Guerreiro, Elena Voita, André F. T. Martins

    Abstract: Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristi… ▽ More

    Submitted 5 March, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: Accepted at EACL23 (main)

  8. arXiv:2109.01396  [pdf, other

    cs.CL

    Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process. Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training, and how this mirr… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  9. arXiv:2010.10907  [pdf, other

    cs.CL

    Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argu… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: ACL 2021 (more accurate results with the improved LRP code)

  10. arXiv:2010.02598  [pdf, other

    cs.CL cs.LG

    Embedding Words in Non-Vector Space with Unsupervised Graph Learning

    Authors: Max Ryabinin, Sergei Popov, Liudmila Prokhorenkova, Elena Voita

    Abstract: It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our sett… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted as a long paper for EMNLP 2020. 15 pages, 6 figures

  11. arXiv:2003.12298  [pdf, other

    cs.CL

    Information-Theoretic Probing with Minimum Description Length

    Authors: Elena Voita, Ivan Titov

    Abstract: To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes, differences in their accuracy fail to adequately reflect differences in representations. For example, they do not substantially favour pretrained representations ov… ▽ More

    Submitted 27 March, 2020; originally announced March 2020.

  12. arXiv:1911.00176  [pdf, other

    cs.CL

    Sequence Modeling with Unconstrained Generation Order

    Authors: Dmitrii Emelianenko, Elena Voita, Pavel Serdyukov

    Abstract: The dominant approach to sequence generation is to produce a sequence in some predefined order, e.g. left to right. In contrast, we propose a more general model that can generate the output sequence by inserting tokens in any arbitrary order. Our model learns decoding order as a result of its training procedure. Our experiments show that this model is superior to fixed order models on a number of… ▽ More

    Submitted 31 October, 2019; originally announced November 2019.

    Comments: Camera-ready version for NeurIPS2019

  13. arXiv:1910.13267  [pdf, other

    cs.CL

    BPE-Dropout: Simple and Effective Subword Regularization

    Authors: Ivan Provilkov, Dmitrii Emelianenko, Elena Voita

    Abstract: Subword segmentation is widely used to address the open vocabulary problem in machine translation. The dominant approach to subword segmentation is Byte Pair Encoding (BPE), which keeps the most frequent words intact while splitting the rare ones into multiple tokens. While multiple segmentations are possible even with the same vocabulary, BPE splits words into unique sequences; this may prevent a… ▽ More

    Submitted 1 May, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: ACL 2020 (camera-ready)

  14. arXiv:1909.01383  [pdf, other

    cs.CL

    Context-Aware Monolingual Repair for Neural Machine Translation

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: Modern sentence-level NMT systems often produce plausible translations of isolated sentences. However, when put in context, these translations may end up being inconsistent with each other. We propose a monolingual DocRepair model to correct inconsistencies between sentence-level translations. DocRepair performs automatic post-editing on a sequence of sentence-level translations, refining translat… ▽ More

    Submitted 15 October, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019 (camera-ready)

  15. arXiv:1909.01380  [pdf, other

    cs.CL

    The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We focus on the Transformers for our analysis as they have been shown effective on various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language model… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019 (camera-ready)

  16. arXiv:1905.09418  [pdf, other

    cs.CL

    Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

    Authors: Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov

    Abstract: Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation. In this work we evaluate the contribution made by individual attention heads in the encoder to the overall performance of the model and analyze the roles played by them. We find that the most important and confident heads play consistent and often linguistically-interpre… ▽ More

    Submitted 7 June, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: ACL 2019 (camera-ready)

  17. arXiv:1905.05979  [pdf, other

    cs.CL

    When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion

    Authors: Elena Voita, Rico Sennrich, Ivan Titov

    Abstract: Though machine translation errors caused by the lack of context beyond one sentence have long been acknowledged, the development of context-aware NMT systems is hampered by several problems. Firstly, standard metrics are not sensitive to improvements in consistency in document-level translations. Secondly, previous work on context-aware NMT assumed that the sentence-aligned parallel data consisted… ▽ More

    Submitted 7 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: ACL 2019 (camera-ready)

  18. arXiv:1810.02268  [pdf, ps, other

    cs.CL

    A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation

    Authors: Mathias Müller, Annette Rios, Elena Voita, Rico Sennrich

    Abstract: The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU. However, metrics that quantify the overall translation quality are ill-eq… ▽ More

    Submitted 6 March, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: Accepted at WMT 2018

  19. arXiv:1805.10163  [pdf, other

    cs.CL

    Context-Aware Neural Machine Translation Learns Anaphora Resolution

    Authors: Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov

    Abstract: Standard machine translation systems process sentences in isolation and hence ignore extra-sentential information, even though extended context can both prevent mistakes in ambiguous cases and improve translation coherence. We introduce a context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be contro… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

    Comments: ACL 2018