Skip to main content

Showing 1–9 of 9 results for author: Chang, J D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clip**), and is notorious for its sensitivity to the precise impleme… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: New experimental results on general chat

  2. arXiv:2404.08513  [pdf, other

    cs.LG cs.AI

    Adversarial Imitation Learning via Boosting

    Authors: Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

    Abstract: Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024

  3. arXiv:2404.08495  [pdf, other

    cs.LG cs.AI cs.CL

    Dataset Reset Policy Optimization for RLHF

    Authors: Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

    Abstract: Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of r… ▽ More

    Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 28 pages, 6 tables, 3 Figures, 3 Algorithms

  4. arXiv:2404.03673  [pdf, other

    cs.CV cs.AI cs.LG

    RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

    Authors: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

    Abstract: Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning… ▽ More

    Submitted 22 June, 2024; v1 submitted 25 March, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures, 1 table

  5. arXiv:2310.04407  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Policy-Gradient Training of Language Models for Ranking

    Authors: Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

    Abstract: Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  6. arXiv:2306.11816  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Generate Better Than Your LLM

    Authors: Jonathan D. Chang, Kiante Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun

    Abstract: Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models (LLMs) for text generation. In particular, recent LLMs such as ChatGPT and GPT-4 can engage in fluent conversations with users after finetuning with RL. Capitalizing on key properties of text generation, we seek to investigate RL algorithms beyond general purpose algorithms like Proximal Policy Opt… ▽ More

    Submitted 13 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 23 pages, 5 figures, 7 tables, 4 algorithms

  7. arXiv:2207.05837  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Learning Bellman Complete Representations for Offline Policy Evaluation

    Authors: Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted for Long Talk at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2938-2971, 2022

  8. SHOP: A Deep Learning Based Pipeline for near Real-Time Detection of Small Handheld Objects Present in Blurry Video

    Authors: Abhinav Ganguly, Amar C Gandhi, Sylvia E, Jeffrey D Chang, Ian M Hudson

    Abstract: While prior works have investigated and developed computational models capable of object detection, models still struggle to reliably interpret images with motion blur and small objects. Moreover, none of these models are specifically designed for handheld object detection. In this work, we present SHOP (Small Handheld Object Pipeline), a pipeline that reliably and efficiently interprets blurry im… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: 8 pages, 5 figures. Accepted to IEEE SoutheastCon 2022

  9. arXiv:2106.03207  [pdf, other

    cs.LG stat.ML

    Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

    Authors: Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

    Abstract: This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework… ▽ More

    Submitted 31 January, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: 42 pages, 5 figures, 7 tables