Skip to main content

Showing 1–9 of 9 results for author: Chemla, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12620  [pdf, other

    cs.CL

    What Makes Two Language Models Think Alike?

    Authors: Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

    Abstract: Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

  2. arXiv:2403.18031  [pdf, other

    cs.CL

    The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

    Authors: Nicolas Guerin, Shane Steinert-Threlkeld, Emmanuel Chemla

    Abstract: Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  3. arXiv:2402.11608  [pdf, other

    cs.CL

    Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

    Authors: Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz

    Abstract: We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 17 pages, 13 figures

  4. arXiv:2402.10013  [pdf, other

    cs.CL cs.FL

    Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length

    Authors: Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: Neural networks offer good approximation to many tasks but consistently fail to reach perfect generalization, even when theoretical work shows that such perfect solutions can be expressed by certain architectures. Using the task of formal language learning, we focus on one simple formal language and show that the theoretically correct solution is in fact not an optimum of commonly used objectives… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures, 3 appendix pages

  5. arXiv:2311.06518  [pdf, other

    cs.LG cs.CL

    Minimum Description Length Hopfield Networks

    Authors: Matan Abudy, Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this trade… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 4 pages, Associative Memory & Hopfield Networks Workshop at NeurIPS2023

  6. arXiv:2308.08253  [pdf, other

    cs.CL

    Benchmarking Neural Network Generalization for Grammar Induction

    Authors: Nur Lan, Emmanuel Chemla, Roni Katzir

    Abstract: How well do neural networks generalize? Even for grammar induction tasks, where the target generalization is fully known, previous works have left the question open, testing very limited ranges beyond the training set and using different success criteria. We provide a measure of neural network generalization based on fully specified formal languages. Given a model and a formal grammar, the method… ▽ More

    Submitted 25 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 10 pages, 4 figures, 2 tables. Conference: Learning with Small Data 2023

  7. arXiv:2111.00600  [pdf, other

    cs.CL

    Minimum Description Length Recurrent Neural Networks

    Authors: Nur Lan, Michal Geyer, Emmanuel Chemla, Roni Katzir

    Abstract: We train neural networks to optimize a Minimum Description Length score, i.e., to balance between the complexity of the network and its accuracy at a task. We show that networks optimizing this objective function master tasks involving memory challenges and go beyond context-free languages. These learners master languages such as $a^nb^n$, $a^nb^nc^n$, $a^nb^{2n}$, $a^nb^mc^{n+m}$, and they perfor… ▽ More

    Submitted 31 March, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: 15 pages

  8. arXiv:2005.00110  [pdf, other

    cs.CL cs.AI cs.MA

    On the Spontaneous Emergence of Discrete and Compositional Signals

    Authors: Nur Geffen Lan, Emmanuel Chemla, Shane Steinert-Threlkeld

    Abstract: We propose a general framework to study language emergence through signaling games with neural agents. Using a continuous latent space, we are able to (i) train using backpropagation, (ii) show that discrete messages nonetheless naturally emerge. We explore whether categorical perception effects follow and show that the messages are not compositional.

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: ACL 2020

  9. arXiv:1707.08017  [pdf, other

    math.LO cs.LO

    Suszko's Problem: Mixed Consequence and Compositionality

    Authors: Emmanuel Chemla, Paul Egré

    Abstract: Suszko's problem is the problem of finding the minimal number of truth values needed to semantically characterize a syntactic consequence relation. Suszko proved that every Tarskian consequence relation can be characterized using only two truth values. Malinowski showed that this number can equal three if some of Tarski's structural constraints are relaxed. By so doing, Malinowski introduced a cas… ▽ More

    Submitted 9 February, 2019; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: Keywords: Suszko's thesis; truth value; logical consequence; mixed consequence; compositionality; truth-functionality; many-valued logic; algebraic logic; substructural logics; regular connectives

    MSC Class: 03B47; 03B50; 03G27