Skip to main content

Showing 1–19 of 19 results for author: Brantley, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clip**), and is notorious for its sensitivity to the precise impleme… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: New experimental results on general chat

  2. arXiv:2404.08513  [pdf, other

    cs.LG cs.AI

    Adversarial Imitation Learning via Boosting

    Authors: Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

    Abstract: Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024

  3. arXiv:2404.08495  [pdf, other

    cs.LG cs.AI cs.CL

    Dataset Reset Policy Optimization for RLHF

    Authors: Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

    Abstract: Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of r… ▽ More

    Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 28 pages, 6 tables, 3 Figures, 3 Algorithms

  4. arXiv:2404.03673  [pdf, other

    cs.CV cs.AI cs.LG

    RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

    Authors: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

    Abstract: Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning… ▽ More

    Submitted 22 June, 2024; v1 submitted 25 March, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures, 1 table

  5. arXiv:2402.17793  [pdf, other

    cs.AI cs.CL cs.LG

    A Surprising Failure? Multimodal LLMs and the NLVR Challenge

    Authors: Anne Wu, Kianté Brantley, Yoav Artzi

    Abstract: This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR. Given a human-written sentence paired with a synthetic image, this task requires the model to determine the truth value of the sentence with respect to the image. Despite the strong performance demonstrated by these models,… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  6. arXiv:2402.10886  [pdf, other

    cs.CL

    Reviewer2: Optimizing Review Generation Through Prompt Generation

    Authors: Zhaolin Gao, Kianté Brantley, Thorsten Joachims

    Abstract: Recent developments in LLMs offer new opportunities for assisting authors in improving their work. In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft. While initial methods for automated review generation already exist, these methods tend to produce reviews that lack detail, and they do not cover the range of opinions… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  7. arXiv:2310.04407  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Policy-Gradient Training of Language Models for Ranking

    Authors: Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

    Abstract: Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  8. arXiv:2307.04923  [pdf, other

    cs.IR

    Ranking with Long-Term Constraints

    Authors: Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

    Abstract: The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers,… ▽ More

    Submitted 7 January, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

  9. arXiv:2306.11816  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Generate Better Than Your LLM

    Authors: Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun

    Abstract: Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models (LLMs) for text generation. In particular, recent LLMs such as ChatGPT and GPT-4 can engage in fluent conversations with users after finetuning with RL. Capitalizing on key properties of text generation, we seek to investigate RL algorithms beyond general purpose algorithms like Proximal Policy Opt… ▽ More

    Submitted 13 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 23 pages, 5 figures, 7 tables, 4 algorithms

  10. arXiv:2303.00908  [pdf, other

    cs.CL

    Interactive Text Generation

    Authors: Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan

    Abstract: Users interact with text, image, code, or other editors on a daily basis. However, machine learning models are rarely trained in the settings that reflect the interactivity between users and their editor. This is understandable as training AI models with real users is not only slow and costly, but what these models learn may be specific to user interface design choices. Unfortunately, this means m… ▽ More

    Submitted 11 November, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023

  11. arXiv:2211.01994  [pdf, other

    cs.LG cs.AI cs.CL

    lilGym: Natural Language Visual Reasoning with Reinforcement Learning

    Authors: Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi

    Abstract: We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We introduce a new approach for exact reward computation in every possible world state by annotating all statements with executable Python programs. Each stat… ▽ More

    Submitted 29 May, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: ACL 2023 Long Paper

  12. arXiv:2210.01241  [pdf, other

    cs.CL cs.LG

    Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

    Authors: Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Ye** Choi

    Abstract: We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of… ▽ More

    Submitted 28 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: In Proceedings of ICLR 2023. Code found at https://github.com/allenai/rl4lms and Project website at https://rl4lms.apps.allenai.org/

  13. arXiv:2103.02650  [pdf, other

    cs.LG

    Successor Feature Sets: Generalizing Successor Representations Across Policies

    Authors: Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon

    Abstract: Successor-style representations have many advantages for reinforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future e… ▽ More

    Submitted 15 March, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

  14. arXiv:2006.05051  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Authors: Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either… ▽ More

    Submitted 5 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

  15. arXiv:2005.12801  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Active Imitation Learning with Noisy Guidance

    Authors: Kianté Brantley, Amr Sharaf, Hal Daumé III

    Abstract: Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexi… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  16. arXiv:1906.09323  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Reinforcement Learning with Convex Constraints

    Authors: Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

    Abstract: In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we… ▽ More

    Submitted 11 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems 32 (2019), 14093-14102

  17. arXiv:1902.02192  [pdf, other

    cs.CL cs.LG stat.ML

    Non-Monotonic Sequential Text Generation

    Authors: Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho

    Abstract: Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary… ▽ More

    Submitted 23 October, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: ICML 2019

  18. arXiv:1708.01318  [pdf, other

    cs.CL cs.AI cs.HC

    The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task

    Authors: Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III

    Abstract: We describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task. The task is to adapt a translation system to a new domain, using only bandit feedback: the system receives a German sentence to translate, produces an English sentence, and only gets a scalar score as feedback. Targeting these two challenges (adaptation and bandit learning… ▽ More

    Submitted 7 August, 2017; v1 submitted 3 August, 2017; originally announced August 2017.

    Comments: 7 pages, 1 figure, WMT 2017 Bandit Learning Task

  19. arXiv:1507.06593  [pdf, other

    cs.IR cs.HC

    LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation

    Authors: Ashwinkumar Ganesan, Kiante Brantley, Shimei Pan, Jian Chen

    Abstract: We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficu… ▽ More

    Submitted 23 July, 2015; originally announced July 2015.