Skip to main content

Showing 1–15 of 15 results for author: Shani, L

.
  1. arXiv:2406.00024  [pdf, other

    cs.CL cs.AI cs.ET cs.LG

    Embedding-Aligned Language Models

    Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier

    Abstract: We propose a novel approach for training large language models (LLMs) to adhere to objectives defined within a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w.r.t. so… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

  2. arXiv:2405.19107  [pdf, ps, other

    cs.LG cs.AI

    Offline Regularised Reinforcement Learning for Large Language Models Alignment

    Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

    Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.14655  [pdf, other

    cs.LG

    Multi-turn Reinforcement Learning from Preference Human Feedback

    Authors: Lior Shani, Aviv Rosenberg, Asaf Cassel, Oran Lang, Daniele Calandriello, Avital Zipori, Hila Noga, Orgad Keller, Bilal Piot, Idan Szpektor, Avinatan Hassidim, Yossi Matias, Rémi Munos

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks. Existing methods work by emulating the preferences at the single decision (turn) level, limiting their capabilities in settings that require planning or multi-turn interactions to ach… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2312.17703  [pdf, other

    cond-mat.mes-hall cond-mat.supr-con

    Evidence for $π$-shifted Cooper quartets and few-mode transport in PbTe nanowire three-terminal Josephson junctions

    Authors: Mohit Gupta, Vipin Khade, Colin Riggert, Lior Shani, Gavin Menning, Pim Lueb, Jason Jung, Régis Mélin, Erik P. A. M. Bakkers, Vlad S. Pribiag

    Abstract: Josephson junctions are typically characterized by a single phase difference across two superconductors. This conventional two-terminal Josephson junction can be generalized to a multi-terminal device where the Josephson energy contains terms with contributions from multiple independent phase variables. Such multi-terminal Josephson junctions (MTJJs) are being considered as platforms for engineeri… ▽ More

    Submitted 22 April, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

  5. arXiv:2310.04475  [pdf, other

    cs.CL cs.AI cs.LG

    Demystifying Embedding Spaces using Large Language Models

    Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier

    Abstract: Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machin… ▽ More

    Submitted 13 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  6. arXiv:2306.00186  [pdf, other

    cs.CL

    Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

    Authors: Paul Roit, Johan Ferret, Lior Shani, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor

    Abstract: Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this p… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: ACL 2023

  7. arXiv:2306.00117  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Diffusive and Ballistic Transport in Ultra-thin InSb Nanowire Devices Using a Few-layer-Graphene-AlOx Gate

    Authors: Lior Shani, Pim Lueb, Gavin Menning, Mohit Gupta, Colin Riggert, Tyler Littman, Frey Hackbarth, Marco Rossi, Jason Jung, Ghada Badawy, Marcel A. Verheijen, Paul Crowell, Erik P. A. M. Bakkers, Vlad S. Pribiag

    Abstract: Quantum devices based on InSb nanowires (NWs) are a prime candidate system for realizing and exploring topologically-protected quantum states and for electrically-controlled spin-based qubits. The influence of disorder on achieving reliable topological regimes has been studied theoretically, highlighting the importance of optimizing both growth and nanofabrication. In this work we investigate both… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 14 pages, 5 figures

  8. arXiv:2302.02061  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Reinforcement Learning with History-Dependent Dynamic Contexts

    Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutilier

    Abstract: We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveragin… ▽ More

    Submitted 17 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Published in ICML 2023

  9. arXiv:2205.15376  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with a Terminator

    Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

    Abstract: We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We lea… ▽ More

    Submitted 5 October, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  10. arXiv:2102.06924  [pdf, other

    cs.LG stat.ML

    Online Apprenticeship Learning

    Authors: Lior Shani, Tom Zahavy, Shie Mannor

    Abstract: In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the… ▽ More

    Submitted 29 December, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: AAAI 2022

  11. arXiv:2005.09814  [pdf, other

    cs.LG cs.AI stat.ML

    Mirror Descent Policy Optimization

    Authors: Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

    Abstract: Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). However, there remains a considerable gap between such theoretically analyzed algorithms and the ones used in practice. Inspired by this, we propose an efficient RL algorithm, called {\em mirror desc… ▽ More

    Submitted 7 June, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

  12. arXiv:2002.08243  [pdf, ps, other

    cs.LG stat.ML

    Optimistic Policy Optimization with Bandit Feedback

    Authors: Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

    Abstract: Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of exploration, or by making strong assumptions on the interaction with the environment. In this paper we consider model-based RL in the tabular finite-horizon MDP setting… ▽ More

    Submitted 18 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020

  13. arXiv:1909.02769  [pdf, ps, other

    cs.LG math.OC stat.ML

    Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

    Authors: Lior Shani, Yonathan Efroni, Shie Mannor

    Abstract: Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be 'close' to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling me… ▽ More

    Submitted 12 December, 2019; v1 submitted 6 September, 2019; originally announced September 2019.

    Comments: Published at AAAI-2020 58 pages

  14. arXiv:1812.07010  [pdf, other

    cs.LG cs.CV stat.ML

    Multi Instance Learning For Unbalanced Data

    Authors: Mark Kozdoba, Edward Moroshko, Lior Shani, Takuya Takagi, Takashi Katoh, Shie Mannor, Koby Crammer

    Abstract: In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective. We show that when the data is unbalanced and the family of classifiers is sufficiently rich, the SI method is a useful learning algorithm. In particular, we show that larger data imbalance, a quality that is typically perceived as negative, in fact implies a better resilience of the algorithm to the… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  15. arXiv:1812.05551  [pdf, other

    cs.LG stat.ML

    Exploration Conscious Reinforcement Learning Revisited

    Authors: Lior Shani, Yonathan Efroni, Shie Mannor

    Abstract: The Exploration-Exploitation tradeoff arises in Reinforcement Learning when one cannot tell if a policy is optimal. Then, there is a constant need to explore new actions instead of exploiting past experience. In practice, it is common to resolve the tradeoff by using a fixed exploration mechanism, such as $ε$-greedy exploration or by adding Gaussian noise, while still trying to learn an optimal po… ▽ More

    Submitted 13 May, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

    Comments: Published @ICML 2019 (36th International Conference on Machine Learning 2019)

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5680-5689, 2019