Skip to main content

Showing 1–8 of 8 results for author: Miryoosefi, S

.
  1. arXiv:2406.02469  [pdf, other

    cs.LG cs.CL

    Landscape-Aware Growing: The Power of a Little LAG

    Authors: Stefani Karp, Nikunj Saunshi, Sobhan Miryoosefi, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2402.05913  [pdf, other

    cs.CL cs.LG

    Efficient Stagewise Pretraining via Progressive Subnetworks

    Authors: Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

    Abstract: Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evalu… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2312.10003  [pdf, other

    cs.CL

    ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

    Authors: Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar

    Abstract: Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 19 pages, 4 figures, 4 tables, 8 listings

  4. arXiv:2202.03983  [pdf, other

    cs.LG cs.AI

    Provable Reinforcement Learning with a Short-Term Memory

    Authors: Yonathan Efroni, Chi **, Akshay Krishnamurthy, Sobhan Miryoosefi

    Abstract: Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Co** with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Dec… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

  5. arXiv:2107.05216  [pdf, ps, other

    cs.LG cs.AI

    A Simple Reward-free Approach to Constrained Reinforcement Learning

    Authors: Sobhan Miryoosefi, Chi **

    Abstract: In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  6. arXiv:2102.00815  [pdf, other

    cs.LG cs.AI stat.ML

    Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

    Authors: Chi **, Qinghua Liu, Sobhan Miryoosefi

    Abstract: Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure -- Bellman Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

  7. arXiv:2006.05051  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Authors: Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either… ▽ More

    Submitted 5 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

  8. arXiv:1906.09323  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Reinforcement Learning with Convex Constraints

    Authors: Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

    Abstract: In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we… ▽ More

    Submitted 11 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems 32 (2019), 14093-14102