Skip to main content

Showing 1–25 of 25 results for author: Charton, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07515  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

    Authors: Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe

    Abstract: Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2406.02128  [pdf, other

    cs.LG cs.AI cs.CL

    Iteration Head: A Mechanistic Study of Chain-of-Thought

    Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

    Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.19787  [pdf, other

    cs.CL cs.AI cs.LG cs.LO cs.PL

    From Symbolic Tasks to Code Generation: Diversification Yields Better Task Performers

    Authors: Dylan Zhang, Justin Wang, Francois Charton

    Abstract: Instruction tuning -- tuning large language models on instruction-output pairs -- is a promising technique for making models better adapted to the real world. Yet, the key factors driving the model's capability to understand and follow instructions not seen during training remain under-explored. Our investigation begins with a series of synthetic experiments within the theoretical framework of a T… ▽ More

    Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.06107  [pdf, other

    cs.LG cs.SC hep-ph hep-th stat.ML

    Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory

    Authors: Tianji Cai, Garrett W. Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, Lance J. Dixon

    Abstract: We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar N = 4 Super Yang-Mills theory is a close cousin to the theory that describes Higgs boson production at the Large Hadron Collider; its scattering amplitudes are large mathematical expressions containing integer coefficients. In this paper, we apply Transformers to predict t… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 26+10 pages, 9 figures, 7 tables, application of machine learning aimed at physics and machine learning audience

    Report number: SLAC-PUB-17774

  5. arXiv:2403.10328  [pdf, other

    cs.CR

    The cool and the cruel: separating hard parts of LWE secrets

    Authors: Niklas Nolte, Mohamed Malhou, Emily Wenger, Samuel Stevens, Cathy Li, François Charton, Kristin Lauter

    Abstract: Sparse binary LWE secrets are under consideration for standardization for Homomorphic Encryption and its applications to private computation. Known attacks on sparse binary LWE secrets include the sparse dual attack and the hybrid sparse dual-meet in the middle attack which requires significant memory. In this paper, we provide a new statistical attack with low memory requirement. The attack relie… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  6. arXiv:2402.10891  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction Diversity Drives Generalization To Unseen Tasks

    Authors: Dylan Zhang, Justin Wang, Francois Charton

    Abstract: Instruction tuning -- fine-tuning a large language model (LLM) on pairs of instructions and desired outcomes -- is an approach that enables pre-trained language models to perform real-world tasks and follow human instructions. Its practical success depends on the model learning a broader set of instructions than those it was trained on. Yet the factors that determine model generalization to such \… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  7. arXiv:2402.07043  [pdf, other

    cs.LG cs.AI cs.CL

    A Tale of Tails: Model Collapse as a Change of Scaling Laws

    Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe

    Abstract: As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will… ▽ More

    Submitted 31 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Journal ref: ICML 2024

  8. arXiv:2402.01082  [pdf, other

    cs.CR cs.LG

    Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors

    Authors: Samuel Stevens, Emily Wenger, Cathy Li, Niklas Nolte, Eshika Saxena, François Charton, Kristin Lauter

    Abstract: Learning with Errors (LWE) is a hard math problem underlying recently standardized post-quantum cryptography (PQC) systems for key exchange and digital signatures. Prior work proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods -- bett… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 8 pages (main text)

  9. arXiv:2308.15594  [pdf, ps, other

    cs.LG cs.AI

    Learning the greatest common divisor: explaining transformer predictions

    Authors: François Charton

    Abstract: The predictions of small transformers, trained to calculate the greatest common divisor (GCD) of two positive integers, can be fully characterized by looking at model inputs and outputs. As training proceeds, the model learns a list $\mathcal D$ of integers, products of divisors of the base used to represent integers and small primes, and predicts the largest element of $\mathcal D$ that divides b… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  10. arXiv:2306.15400  [pdf, other

    cs.LG

    Length Generalization in Arithmetic Transformers

    Authors: Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

    Abstract: We examine how transformers cope with two challenges: learning basic integer arithmetic, and generalizing to longer sequences than seen during training. We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums. However, this method fails for multiplication, and we propose train set pri… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  11. arXiv:2306.11641  [pdf, ps, other

    cs.CR

    SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets

    Authors: Cathy Yuanchen Li, Emily Wenger, Zeyuan Allen-Zhu, Francois Charton, Kristin Lauter

    Abstract: Learning with Errors (LWE) is a hard math problem used in post-quantum cryptography. Homomorphic Encryption (HE) schemes rely on the hardness of the LWE problem for their security, and two LWE-based cryptosystems were recently standardized by NIST for digital signatures and key exchange (KEM). Thus, it is critical to continue assessing the security of LWE and specific parameter choices. For exampl… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 18 pages, accepted to NeurIPS 2023

  12. arXiv:2305.04985  [pdf

    cs.SI cs.HC

    A mimetic approach to social influence on Instagram

    Authors: Hubert Etienne, François Charton

    Abstract: We combine philosophical theories with quantitative analyses of online data to propose a sophisticated approach to social media influencers. Identifying influencers as communication systems emerging from a dialectic interactional process between content creators and in-development audiences, we define them mainly using the composition of their audience and the type of publications they use to comm… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  13. arXiv:2303.04178  [pdf, ps, other

    cs.CR cs.LG

    SALSA PICANTE: a machine learning attack on LWE with binary secrets

    Authors: Cathy Li, Jana Sotáková, Emily Wenger, Mohamed Malhou, Evrard Garcelon, Francois Charton, Kristin Lauter

    Abstract: Learning with Errors (LWE) is a hard math problem underpinning many proposed post-quantum cryptographic (PQC) systems. The only PQC Key Exchange Mechanism (KEM) standardized by NIST is based on module~LWE, and current publicly available PQ Homomorphic Encryption (HE) libraries are based on ring LWE. The security of LWE-based PQ cryptosystems is critical, but certain implementation choices could we… ▽ More

    Submitted 31 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 15 pages, 6 figures, 17 tables; accepted to CCS 2023

  14. arXiv:2303.02226  [pdf, other

    cs.CR math.NA math.NT math.OC

    An efficient algorithm for integer lattice reduction

    Authors: François Charton, Kristin Lauter, Cathy Li, Mark Tygert

    Abstract: A lattice of integers is the collection of all linear combinations of a set of vectors for which all entries of the vectors are integers and all coefficients in the linear combinations are also integers. Lattice reduction refers to the problem of finding a set of vectors in a given lattice such that the collection of all integer linear combinations of this subset is still the entire original latti… ▽ More

    Submitted 3 August, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: 29 pages, 20 figures

    Journal ref: SIAM Journal on Matrix Analysis and Applications, 45 (1): 353-367, 2024

  15. arXiv:2212.11353  [pdf, other

    cs.CL cs.LG

    Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

    Authors: Chris Lengerich, Gabriel Synnaeve, Amy Zhang, Hugh Leather, Kurt Shuster, François Charton, Charysse Redwood

    Abstract: Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional represen… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  16. arXiv:2211.00170  [pdf, ps, other

    cs.LG cs.AI

    What is my math transformer doing? -- Three results on interpretability and generalization

    Authors: François Charton

    Abstract: This paper investigates the failure cases and out-of-distribution behavior of transformers trained on matrix inversion and eigenvalue decomposition. I show that incorrect model predictions still retain deep mathematical properties of the solution (e.g. correct eigenvalues, unit norm of eigenvectors), and that almost all model failures can be attributed to, and predicted from, properties of the pro… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  17. arXiv:2207.04785  [pdf, ps, other

    cs.CR cs.LG

    SALSA: Attacking Lattice Cryptography with Transformers

    Authors: Emily Wenger, Mingjie Chen, François Charton, Kristin Lauter

    Abstract: Currently deployed public-key cryptosystems will be vulnerable to attacks by full-scale quantum computers. Consequently, "quantum resistant" cryptosystems are in high demand, and lattice-based cryptosystems, based on a hard problem known as Learning With Errors (LWE), have emerged as strong contenders for standardization. In this work, we train transformers to perform modular arithmetic and combin… ▽ More

    Submitted 21 April, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: Extended version of work published at NeurIPS 2022

  18. arXiv:2207.03578  [pdf, other

    cs.PL cs.CL cs.LG

    Code Translation with Compiler Representations

    Authors: Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, Gabriel Synnaeve

    Abstract: In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnatural-looking code. Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a natural-looki… ▽ More

    Submitted 24 April, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 9 pages

  19. arXiv:2204.10532  [pdf, other

    cs.LG

    End-to-end symbolic regression with transformers

    Authors: Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton

    Abstract: Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  20. arXiv:2201.04600  [pdf, other

    cs.LG

    Deep Symbolic Regression for Recurrent Sequences

    Authors: Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton

    Abstract: Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS s… ▽ More

    Submitted 28 June, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

  21. arXiv:2112.03588  [pdf, other

    cs.LG cs.CL

    A deep language model to predict metabolic network equilibria

    Authors: François Charton, Amaury Hayat, Sean T. McQuade, Nathaniel J. Merrill, Benedetto Piccoli

    Abstract: We show that deep learning models, and especially architectures like the Transformer, originally intended for natural language, can be trained on randomly generated datasets to predict to very high accuracy both the qualitative and quantitative features of metabolic networks. Using standard mathematical techniques, we create large sets (40 million elements) of random networks that can be used to t… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  22. arXiv:2112.01898  [pdf, other

    cs.LG cs.CL

    Linear algebra with transformers

    Authors: François Charton

    Abstract: Transformers can learn to perform numerical computations from examples only. I study nine problems of linear algebra, from basic matrix operations to eigenvalue decomposition and inversion, and introduce and discuss four encoding schemes to represent real numbers. On all problems, transformers trained on sets of random matrices achieve high accuracies (over 90%). The models are robust to noise, an… ▽ More

    Submitted 8 November, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: Transactions in Machine Learning Research (TMLR), October 2022

  23. arXiv:2110.06773  [pdf, other

    cs.SE cs.CL cs.LG

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Authors: Baptiste Roziere, Jie M. Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample

    Abstract: With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  24. arXiv:2006.06462  [pdf, other

    cs.LG cs.CL

    Learning advanced mathematical computations from examples

    Authors: François Charton, Amaury Hayat, Guillaume Lample

    Abstract: Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect prediction of qualitative characteristics, and good approximations of numerical features of the system. This demonstrates that neural networks can learn to perform complex computations,… ▽ More

    Submitted 19 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  25. arXiv:1912.01412  [pdf, other

    cs.SC cs.LG

    Deep Learning for Symbolic Mathematics

    Authors: Guillaume Lample, François Charton

    Abstract: Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.