Skip to main content

Showing 1–13 of 13 results for author: Grinsztajn, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19188  [pdf, other

    cs.LG

    Averaging log-likelihoods in direct alignment

    Authors: Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

    Abstract: To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19185  [pdf, other

    cs.LG

    Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

    Authors: Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist

    Abstract: Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.16424  [pdf, other

    cs.AI cs.LG

    Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

    Authors: Felix Chalumeau, Refiloe Shabe, Noah de Nicola, Arnu Pretorius, Thomas D. Barrett, Nathan Grinsztajn

    Abstract: Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2311.17371  [pdf, other

    cs.CL cs.AI

    Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

    Authors: Andries Smit, Paul Duckworth, Nathan Grinsztajn, Thomas D. Barrett, Arnu Pretorius

    Abstract: Recent advancements in large language models (LLMs) underscore their potential for responding to inquiries in various domains. However, ensuring that generative agents provide accurate and reliable answers remains an ongoing challenge. In this context, multi-agent debate (MAD) has emerged as a promising strategy for enhancing the truthfulness of LLMs. We benchmark a range of debating and prompting… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 2 pages, 13 figures

  5. arXiv:2311.13569  [pdf, other

    cs.LG cs.AI

    Combinatorial Optimization with Policy Adaptation using Latent Space Search

    Authors: Felix Chalumeau, Shikha Surana, Clement Bonnet, Nathan Grinsztajn, Arnu Pretorius, Alexandre Laterre, Thomas D. Barrett

    Abstract: Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial… ▽ More

    Submitted 28 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Fix typo in formula and add a reference

  6. arXiv:2306.09884  [pdf, other

    cs.LG cs.AI

    Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

    Authors: Clément Bonnet, Daniel Luo, Donal Byrne, Shikha Surana, Sasha Abramowitz, Paul Duckworth, Vincent Coyette, Laurence I. Midgley, Elshadai Tegegn, Tristan Kalloniatis, Omayma Mahjoub, Matthew Macfarlane, Andries P. Smit, Nathan Grinsztajn, Raphael Boige, Cemlyn N. Waters, Mohamed A. Mimouni, Ulrich A. Mbou Sob, Ruan de Kock, Siddarth Singh, Daniel Furelos-Blanco, Victor Le, Arnu Pretorius, Alexandre Laterre

    Abstract: Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments speci… ▽ More

    Submitted 15 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: 9 pages + 21 pages of appendices and references. Published at ICLR 2024

  7. arXiv:2210.03475  [pdf, other

    cs.AI cs.LG

    Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

    Authors: Nathan Grinsztajn, Daniel Furelos-Blanco, Shikha Surana, Clément Bonnet, Thomas D. Barrett

    Abstract: Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic samp… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

  8. arXiv:2208.02821  [pdf, other

    cs.LG cs.AI

    Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round

    Authors: Manh Hung Nguyen, Lisheng Sun, Nathan Grinsztajn, Isabelle Guyon

    Abstract: Meta-learning from learning curves is an important yet often neglected research area in the Machine Learning community. We introduce a series of Reinforcement Learning-based meta-learning challenges, in which an agent searches for the best suited algorithm for a given dataset, based on feedback of learning curves from the environment. The first round attracted participants both from academia and i… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  9. arXiv:2110.10632  [pdf, other

    cs.LG cs.AI

    More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

    Authors: Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

    Abstract: Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different… ▽ More

    Submitted 7 November, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

  10. arXiv:2106.07360  [pdf, other

    cs.SI cs.LG

    Low-Rank Projections of GCNs Laplacian

    Authors: Nathan Grinsztajn, Philippe Preux, Edouard Oyallon

    Abstract: In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the performance of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to images, high frequencie… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Journal ref: ICLR 2021 Workshop GTRL, 2021, Online, France

  11. arXiv:2106.05875  [pdf, other

    cs.LG

    Interferometric Graph Transform for Community Labeling

    Authors: Nathan Grinsztajn, Louis Leconte, Philippe Preux, Edouard Oyallon

    Abstract: We present a new approach for learning unsupervised node representations in community graphs. We significantly extend the Interferometric Graph Transform (IGT) to community labeling: this non-linear operator iteratively extracts features that take advantage of the graph topology through demodulation operations. An unsupervised feature extraction step cascades modulus non-linearity with linear oper… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  12. arXiv:2106.04480  [pdf, other

    cs.LG cs.AI

    There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

    Authors: Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

    Abstract: We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order a… ▽ More

    Submitted 29 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  13. arXiv:2011.04333  [pdf, other

    cs.AI

    Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling

    Authors: Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux

    Abstract: In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non-determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learni… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.