Skip to main content

Showing 1–50 of 98 results for author: Schuurmans, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13094  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring and Benchmarking the Planning Capabilities of Large Language Models

    Authors: Bernd Bohnet, Azade Nova, Aaron T Parisi, Kevin Swersky, Katayoon Goshvadi, Hanjun Dai, Dale Schuurmans, Noah Fiedel, Hanie Sedghi

    Abstract: We seek to elevate the planning capabilities of Large Language Models (LLMs)investigating four main directions. First, we construct a comprehensive benchmark suite encompassing both classical planning domains and natural language scenarios. This suite includes algorithms to generate instances with varying levels of difficulty, allowing for rigorous and systematic evaluation of LLM performance. Sec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.06811  [pdf, other

    cs.LG

    Learning Continually by Spectral Regularization

    Authors: Alex Lewandowski, Saurabh Kumar, Dale Schuurmans, András György, Marlos C. Machado

    Abstract: Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early ph… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  3. arXiv:2405.21043  [pdf, other

    cs.LG cs.AI

    Target Networks and Over-parameterization Stabilize Off-policy Bootstrap** with Function Approximation

    Authors: Fengdi Che, Chenjun Xiao, **cheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

    Abstract: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Journal ref: Proceedings of the 41 st International Conference on Machine Learning, 2024

  4. arXiv:2405.19320  [pdf, other

    cs.LG cs.AI stat.ML

    Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

    Authors: Shicong Cen, **cheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF,… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2405.00747  [pdf, other

    cs.LG cs.AI

    Soft Preference Optimization: Aligning Language Models to Expert Distributions

    Authors: Arsalan Sharifnassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans

    Abstract: We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than l… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

  6. arXiv:2402.17235  [pdf, other

    cs.LG

    Stochastic Gradient Succeeds for Bandits

    Authors: **cheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

    Abstract: We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two nove… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 39 pages; Correction for a previous version published at ICML 2023 conference

  7. arXiv:2402.17139  [pdf, other

    cs.CV cs.AI

    Video as the New Language for Real-World Decision Making

    Authors: Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

    Abstract: Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  8. arXiv:2402.02698  [pdf, other

    cs.LG cs.AI math.OC

    Beyond Expectations: Learning with Stochastic Dominance Made Practical

    Authors: Shicong Cen, **cheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  9. arXiv:2312.00246  [pdf, other

    cs.LG

    Directions of Curvature as an Explanation for Loss of Plasticity

    Authors: Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

    Abstract: Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of p… ▽ More

    Submitted 27 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  10. arXiv:2311.12244  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

    Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

    Abstract: In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: The first two authors contribute equally

  11. arXiv:2311.09235  [pdf, other

    cs.LG cs.AI

    Scalable Diffusion for Materials Generation

    Authors: Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

    Abstract: Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 October, 2023; originally announced November 2023.

    Comments: https://unified-materials.github.io/

  12. arXiv:2310.07064  [pdf, other

    cs.AI cs.CL

    Large Language Models can Learn Rules

    Authors: Zhaocheng Zhu, Yuan Xue, Xinyun Chen, Denny Zhou, Jian Tang, Dale Schuurmans, Hanjun Dai

    Abstract: When prompted with a few examples and intermediate steps, large language models (LLMs) have demonstrated impressive performance in various reasoning tasks. However, prompting methods that rely on implicit knowledge in an LLM often generate incorrect answers when the implicit knowledge is wrong or inconsistent with the task. To tackle this problem, we present Hypotheses-to-Theories (HtT), a framewo… ▽ More

    Submitted 24 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  13. arXiv:2310.06114  [pdf, other

    cs.AI

    Learning Interactive Real-World Simulators

    Authors: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

    Abstract: Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied a… ▽ More

    Submitted 12 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: https://universal-simulator.github.io

  14. arXiv:2306.01872  [pdf, other

    cs.AI

    Probabilistic Adaptation of Text-to-Video Models

    Authors: Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

    Abstract: Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expens… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Project website https://video-adapter.github.io/. First two authors contributed equally

  15. arXiv:2303.04185  [pdf, other

    cs.LG cs.AI cs.CL

    Gradient-Free Structured Pruning with Unlabeled Data

    Authors: Azade Nova, Hanjun Dai, Dale Schuurmans

    Abstract: Large Language Models (LLMs) have achieved great success in solving difficult tasks across many domains, but such success comes with a high computation cost, and inference latency. As developers and third parties customize these models, the need to provide efficient inference has increased. Many efforts have attempted to reduce inference cost through model compression techniques such as pruning an… ▽ More

    Submitted 15 July, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Presented in ICML 2023

  16. arXiv:2303.04129  [pdf, other

    cs.AI cs.LG

    Foundation Models for Decision Making: Problems, Methods, and Opportunities

    Authors: Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

    Abstract: Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. When such models are deployed in real world environments, they inevitably interface with other entities and agents. For example, language models are often used to interact with human beings through dialogue, and visual perception models are used to autono… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  17. arXiv:2302.00111  [pdf, other

    cs.AI

    Learning Universal Policies via Text-Guided Video Generation

    Authors: Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel

    Abstract: A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks. Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. Spe… ▽ More

    Submitted 20 November, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: NeurIPS 2023, Project Website: https://universal-policy.github.io/

  18. arXiv:2301.06276  [pdf, other

    cs.LG cs.AI

    The Role of Baselines in Policy Gradient Optimization

    Authors: **cheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, Dale Schuurmans

    Abstract: We study the effect of baselines in on-policy stochastic policy gradient optimization, and close the gap between the theory and practice of policy optimization methods. Our first contribution is to show that the \emph{state value} baseline allows on-policy stochastic \emph{natural} policy gradient (NPG) to converge to a globally optimal policy at an $O(1/t)$ rate, which was not previously known. T… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: 55 pages; published at NeurIPS 2022

  19. arXiv:2301.04589  [pdf, ps, other

    cs.CL cs.FL

    Memory Augmented Large Language Models are Computationally Universal

    Authors: Dale Schuurmans

    Abstract: We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potent… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: 23 pages, 0 figures

  20. arXiv:2212.08949  [pdf, other

    cs.LG eess.SY stat.ML

    Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

    Authors: Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans

    Abstract: A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its… ▽ More

    Submitted 16 January, 2024; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023

  21. arXiv:2212.08765  [pdf, other

    cs.LG stat.ML

    Latent Variable Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

    Abstract: Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a… ▽ More

    Submitted 7 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: ICLR 2023. The first two authors contribute equally. Project Website: https://rlrep.github.io/lvrep/

  22. arXiv:2212.08235  [pdf, other

    cs.LG cs.RO

    A Simple Decentralized Cross-Entropy Method

    Authors: Zichen Zhang, Jun **, Martin Jagersand, Jun Luo, Dale Schuurmans

    Abstract: Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue,… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2022. The last two authors advised equally

  23. arXiv:2211.16750  [pdf, other

    cs.LG

    Score-based Continuous-time Discrete Diffusion Models

    Authors: Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai

    Abstract: Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, i.e., the score function, is not properly defined for discrete spaces. This makes it non-trivial to adapt \textcolor{\cdiff}{the score-based modeling} to categorical… ▽ More

    Submitted 6 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  24. arXiv:2211.15661  [pdf, other

    cs.LG cs.CL

    What learning algorithm is in-context learning? Investigations with linear models

    Authors: Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

    Abstract: Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in the… ▽ More

    Submitted 17 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: ICLR2023 Camera Ready

  25. arXiv:2211.11890  [pdf, other

    cs.CL cs.AI

    TEMPERA: Test-Time Prompting via Reinforcement Learning

    Authors: Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez

    Abstract: Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive t… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  26. arXiv:2211.07767  [pdf, other

    stat.ML cs.LG math.OC

    Learning to Optimize with Stochastic Dominance Constraints

    Authors: Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai

    Abstract: In real-world decision-making, uncertainty is important yet difficult to handle. Stochastic dominance provides a theoretically sound approach for comparing uncertain quantities, but optimization with stochastic dominance constraints is often computationally expensive, which limits practical applicability. In this paper, we develop a simple yet efficient approach for the problem, the Light Stochast… ▽ More

    Submitted 24 February, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted to the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

  27. arXiv:2210.13435  [pdf, other

    cs.LG

    Dichotomy of Control: Separating What You Can Control from What You Cannot

    Authors: Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

    Abstract: Future- or return-conditioned supervised learning is an emerging paradigm for offline reinforcement learning (RL), where the future outcome (i.e., return) associated with an observed action sequence is used as input to a policy trained to imitate those same actions. While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poor… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  28. arXiv:2209.08183  [pdf, other

    cs.LG

    Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

    Authors: Haoran Sun, Hanjun Dai, Dale Schuurmans

    Abstract: Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the… ▽ More

    Submitted 14 October, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

  29. arXiv:2208.09515  [pdf, other

    cs.LG stat.ML

    Spectral Decomposition Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because t… ▽ More

    Submitted 7 March, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: ICLR 2023. The first two authors contribute equally

  30. arXiv:2207.07150  [pdf, other

    cs.LG stat.ML

    Making Linear MDPs Practical via Contrastive Representation Learning

    Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we… ▽ More

    Submitted 7 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: ICML 2022. The first two authors contribute equally

  31. arXiv:2207.00747  [pdf, other

    cs.CL

    Rationale-Augmented Ensembles in Language Models

    Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou

    Abstract: Recent research has shown that rationales, or step-by-step chains of thought, can be used to improve performance in multi-step reasoning tasks. We reconsider rationale-augmented prompting for few-shot in-context learning, where (input -> output) prompts are expanded to (input, rationale -> output) prompts. For rationale-augmented prompting we demonstrate how existing approaches, which rely on manu… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  32. arXiv:2206.14897  [pdf, other

    cs.LG

    Discrete Langevin Sampler via Wasserstein Gradient Flow

    Authors: Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans

    Abstract: It is known that gradient-based MCMC samplers for continuous spaces, such as Langevin Monte Carlo (LMC), can be derived as particle versions of a gradient flow that minimizes KL divergence on a Wasserstein manifold. The superior efficiency of such samplers has motivated several recent attempts to generalize LMC to discrete spaces. However, a fully principled extension of Langevin dynamics to discr… ▽ More

    Submitted 22 February, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

  33. arXiv:2206.08499  [pdf, other

    cs.LG cs.AI

    A Parametric Class of Approximate Gradient Updates for Policy Optimization

    Authors: Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans

    Abstract: Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e.g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return. To better capture the commonalities and identify key differences between policy optimization methods, we develop a unified pe… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Journal ref: ICML 2022

  34. arXiv:2205.14204  [pdf, other

    cs.CV

    Multimodal Masked Autoencoders Learn Transferable Representations

    Authors: Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

    Abstract: Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While effective, contrastive learning approaches introduce sampling bias depending on the data augmentations used, which can degrade performance on downstream tasks.… ▽ More

    Submitted 21 October, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  35. arXiv:2205.10816  [pdf, other

    cs.LG cs.AI

    Chain of Thought Imitation with Procedure Cloning

    Authors: Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

    Abstract: Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output map** exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  36. arXiv:2205.10625  [pdf, other

    cs.AI cs.CL

    Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

    Authors: Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

    Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to brea… ▽ More

    Submitted 16 April, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: ICLR 2023

  37. arXiv:2204.11897  [pdf, other

    cs.LG

    Reinforcement Teaching

    Authors: Alex Lewandowski, Calarina Muslimani, Dale Schuurmans, Matthew E. Taylor, Jun Luo

    Abstract: Meta-learning strives to learn about and improve a student's machine learning algorithm. However, existing meta-learning methods either only work with differentiable algorithms or are hand-crafted to improve one specific component of an algorithm. We develop a unifying meta-learning framework, called Reinforcement Teaching, to improve the learning process of any algorithm. Under Reinforcement Teac… ▽ More

    Submitted 22 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: First two authors contributed equally

  38. arXiv:2203.11171  [pdf, other

    cs.CL cs.AI

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

    Abstract: Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consist… ▽ More

    Submitted 7 March, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Published at ICLR 2023. V2: added PaLM results; V3: added UL2 results; V4: camera ready version at ICLR 2023

  39. arXiv:2202.00872  [pdf, other

    math.OC cs.MA

    On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

    Authors: Runyu Zhang, **cheng Mei, Bo Dai, Dale Schuurmans, Na Li

    Abstract: Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nas… ▽ More

    Submitted 29 October, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

  40. arXiv:2201.11903  [pdf, other

    cs.CL cs.AI

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

    Abstract: We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided… ▽ More

    Submitted 10 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

  41. arXiv:2112.00874  [pdf, other

    cs.LG stat.ML

    Neural Stochastic Dual Dynamic Programming

    Authors: Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

    Abstract: Stochastic dual dynamic programming (SDDP) is a state-of-the-art method for solving multi-stage stochastic optimization, widely used for modeling real-world process optimization tasks. Unfortunately, SDDP has a worst-case complexity that scales exponentially in the number of decision variables, which severely limits applicability to only low dimensional problems. To overcome this limitation, we ex… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 24 pages

  42. arXiv:2110.15572  [pdf, other

    cs.LG

    Understanding the Effect of Stochasticity in Policy Optimization

    Authors: **cheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

    Abstract: We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients are used. In particular, unlike the true gradient setting, geometric information cannot be easily exploited in the stochastic case for accelerating policy optim… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

    Comments: 68 pages; Accepted at NeurIPS 2021

  43. arXiv:2110.14890  [pdf, other

    cs.LG cs.AI cs.DB cs.DC

    SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

    Authors: Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans

    Abstract: Knowledge graphs (KGs) capture knowledge in the form of head--relation--tail triples and are a crucial component in many AI systems. There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query. Embedding-base… ▽ More

    Submitted 1 November, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

  44. arXiv:2107.05768  [pdf, other

    cs.LG cs.CL cs.CV

    Combiner: Full Attention Transformer with Sparse Computation Cost

    Authors: Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

    Abstract: Transformers provide a class of expressive architectures that are extremely effective for sequence modeling. However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences. Most existing approaches leverage sparsity or low-rank assumptions i… ▽ More

    Submitted 28 October, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021 spotlight

  45. arXiv:2106.09973  [pdf, other

    cs.LG cs.AI

    The Curse of Passive Data Collection in Batch Reinforcement Learning

    Authors: Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvari

    Abstract: In high stake applications, active experimentation may be considered too risky and thus data are often collected passively. While in simple cases, such as in bandits, passive and active data collection are similarly effective, the price of passive sampling can be much higher when collecting data from a system with controlled states. The main focus of the current paper is the characterization of th… ▽ More

    Submitted 5 July, 2023; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: 27 pages, 2 figures. AISTATS 2022. In this revision, we fix an error in the previous upper bound results

  46. arXiv:2106.06932  [pdf, other

    cs.AI cs.LG

    Characterizing the Gap Between Actor-Critic and Policy Gradient

    Authors: Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans

    Abstract: Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain the gap between AC and PG methods by identifying the exact adjustment to the AC objective/gradient that recovers the true policy gradient of the cumu… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  47. arXiv:2105.06072  [pdf, other

    cs.LG

    Leveraging Non-uniformity in First-order Non-convex Optimization

    Authors: **cheng Mei, Yue Gao, Bo Dai, Csaba Szepesvari, Dale Schuurmans

    Abstract: Classical global convergence results for first-order methods rely on uniform smoothness and the Łojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform Łojasiewicz inequality} (NŁ). The new definitions inspire new geometry-aware fi… ▽ More

    Submitted 2 June, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: 48 pages, 10 figures. Accepted at ICML 2021

  48. arXiv:2104.07750  [pdf, other

    cs.AI cs.MA

    Joint Attention for Multi-Agent Coordination and Social Learning

    Authors: Dennis Lee, Natasha Jaques, Chase Kew, Jiaxing Wu, Douglas Eck, Dale Schuurmans, Aleksandra Faust

    Abstract: Joint attention - the ability to purposefully coordinate attention with another agent, and mutually attend to the same thing -- is a critical component of human social cognition. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual atten… ▽ More

    Submitted 7 August, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  49. arXiv:2104.02293  [pdf, other

    cs.LG

    On the Optimality of Batch Policy Optimization Algorithms

    Authors: Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, **cheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

    Abstract: Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment. Although interest in this problem has grown significantly in recent years, its theoretical foundations remain under-developed. To advance the understanding of this problem, we provide three results that characterize the limits and possibilities of batch policy optimization i… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 29 pages, 8 figures

  50. arXiv:2102.06234  [pdf, other

    cs.LG stat.ML

    Optimization Issues in KL-Constrained Approximate Policy Iteration

    Authors: Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

    Abstract: Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy. Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutiv… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.