Skip to main content

Showing 1–16 of 16 results for author: Zanette, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.19446  [pdf, other

    cs.LG cs.AI cs.CL

    ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

    Authors: Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

    Abstract: A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a genera… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  2. arXiv:2402.15703  [pdf, other

    cs.LG cs.AI stat.ML

    Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

    Authors: Ruiqi Zhang, Yuexiang Zhai, Andrea Zanette

    Abstract: What can an agent learn in a stochastic Multi-Armed Bandit (MAB) problem from a dataset that contains just a single sample for each arm? Surprisingly, in this work, we demonstrate that even in such a data-starved setting it may still be possible to find a policy competitive with the optimal one. This paves the way to reliable decision-making in settings where critical decisions must be made by rel… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 22 pages

  3. arXiv:2307.04354  [pdf, ps, other

    cs.LG stat.ML

    Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

    Authors: Ruiqi Zhang, Andrea Zanette

    Abstract: In some applications of reinforcement learning, a dataset of pre-collected experience is already available but it is also possible to acquire some additional online data to help improve the quality of the policy. However, it may be preferable to gather additional data with a single, non-reactive exploration policy and avoid the engineering costs associated with switching policies. In this paper… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 43 pages

  4. arXiv:2211.05311  [pdf, ps, other

    cs.LG

    When is Realizability Sufficient for Off-Policy Reinforcement Learning?

    Authors: Andrea Zanette

    Abstract: Model-free algorithms for reinforcement learning typically require a condition called Bellman completeness in order to successfully operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structura… ▽ More

    Submitted 5 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Appears in ICML 2023 - any feedback is welcome

  5. arXiv:2206.00796  [pdf, ps, other

    cs.LG

    Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

    Authors: Andrea Zanette, Martin J. Wainwright

    Abstract: The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Appears in ICML 2022

  6. arXiv:2203.12786  [pdf, ps, other

    cs.LG

    Bellman Residual Orthogonalization for Offline Reinforcement Learning

    Authors: Andrea Zanette, Martin J. Wainwright

    Abstract: We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed po… ▽ More

    Submitted 11 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Appears in NeurIPS 2022

  7. arXiv:2108.08812  [pdf, ps, other

    cs.LG

    Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

    Authors: Andrea Zanette, Martin J. Wainwright, Emma Brunskill

    Abstract: Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action valu… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: Initial submission; appeared as spotlight talk in ICML 2021 Workshop on Theory of RL

  8. arXiv:2107.09912  [pdf, other

    cs.LG stat.ML

    Design of Experiments for Stochastic Contextual Linear Bandits

    Authors: Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

    Abstract: In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Explorin… ▽ More

    Submitted 22 July, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Initial submission

  9. arXiv:2103.12923  [pdf, ps, other

    cs.LG

    Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

    Authors: Andrea Zanette, Ching-An Cheng, Alekh Agarwal

    Abstract: Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts. However, the same properties also make them slow to converge and sample inefficient, as the on-policy requirement precludes data reuse and the incremental updates couple large iteration complexity into the sample complex… ▽ More

    Submitted 29 June, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Appears in COLT 2021

  10. arXiv:2012.08005  [pdf, ps, other

    cs.LG cs.AI

    Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

    Authors: Andrea Zanette

    Abstract: Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive \emph{exponential} information-theoretic lower bounds in discounted infinite horizon MDPs with a linear func… ▽ More

    Submitted 19 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to ICML 2021 as long talk

  11. arXiv:2008.07737  [pdf, ps, other

    cs.LG stat.ML

    Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

    Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

    Abstract: There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more sta… ▽ More

    Submitted 21 October, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Minor update; appears in NeurIPS

  12. arXiv:2003.00153  [pdf, ps, other

    cs.LG cs.AI

    Learning Near Optimal Policies with Low Inherent Bellman Error

    Authors: Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

    Abstract: We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value iteration. First we relate this condition to other common frameworks and show that it is strictly more general than the low rank (or linear) MDP assumption of prior w… ▽ More

    Submitted 28 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Bug fixes in appendix; appears in ICML 2020

  13. arXiv:1911.00954  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

    Authors: Andrea Zanette, Emma Brunskill

    Abstract: In order to make good decision under uncertainty an agent must learn from observations. To do so, two of the most common frameworks are Contextual Bandits and Markov Decision Processes (MDPs). In this paper, we study whether there exist algorithms for the more general framework (MDP) which automatically provide the best performance bounds for the specific problem at hand without user intervention… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Journal ref: International Conference on Machine Learning, 2018

  14. arXiv:1911.00567  [pdf, ps, other

    cs.LG stat.ML

    Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

    Authors: Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where… ▽ More

    Submitted 8 September, 2023; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Minor bug fixes

  15. arXiv:1901.00210  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

    Authors: Andrea Zanette, Emma Brunskill

    Abstract: Strong worst-case performance bounds for episodic reinforcement learning exist but fortunately in practice RL algorithms perform much better than such bounds would predict. Algorithms and theory that provide strong problem-dependent bounds could help illuminate the key features of what makes a RL problem hard and reduce the barrier to using RL algorithms in practice. As a step towards this we deri… ▽ More

    Submitted 1 November, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

    Comments: Bug fixes

    Journal ref: International Conference on Machine Learning 2019

  16. arXiv:1811.09977  [pdf, other

    stat.ML cs.LG

    Robust Super-Level Set Estimation using Gaussian Processes

    Authors: Andrea Zanette, Junzi Zhang, Mykel J. Kochenderfer

    Abstract: This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicte… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: Accepted to ECML 2018