Skip to main content

Showing 1–6 of 6 results for author: Brandfonbrener, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2112.00950  [pdf, other

    cs.LG stat.ML

    Quantile Filtered Imitation Learning

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: We introduce quantile filtered imitation learning (QFIL), a novel policy improvement operator designed for offline reinforcement learning. QFIL performs policy improvement by running imitation learning on a filtered version of the offline dataset. The filtering process removes $ s,a $ pairs whose estimated Q values fall below a given quantile of the pushforward distribution over values induced by… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2021

  2. arXiv:2106.08909  [pdf, other

    cs.LG stat.ML

    Offline RL Without Off-Policy Evaluation

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well. This one-step algorithm beats the previously reported results of iterative algorithm… ▽ More

    Submitted 3 December, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Thirty-fifth Conference on Neural Information Processing Systems, 2021

  3. arXiv:2009.07368  [pdf, other

    cs.LG cs.AI stat.ML

    Evaluating representations by the complexity of learning low-loss predictors

    Authors: William F. Whitney, Min Jae Song, David Brandfonbrener, Jaan Altosaar, Kyunghyun Cho

    Abstract: We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest, and introduce two methods, surplus description length (SDL) and $\varepsilon$ sample complexity ($\varepsilon$SC). In contrast to… ▽ More

    Submitted 5 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

  4. arXiv:2006.15368  [pdf, other

    cs.LG stat.ML

    Offline Contextual Bandits with Overparameterized Models

    Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

    Abstract: Recent results in supervised learning suggest that while overparameterized models have the capacity to overfit, they in fact generalize quite well. We ask whether the same phenomenon occurs for offline contextual bandits. Our results are mixed. Value-based algorithms benefit from the same generalization behavior as overparameterized supervised learning, but policy-based algorithms do not. We show… ▽ More

    Submitted 16 June, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  5. arXiv:1911.00567  [pdf, ps, other

    cs.LG stat.ML

    Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

    Authors: Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta, Alessandro Lazaric

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory. In this paper, we introduce an optimistically-initialized variant of the popular randomized least-squares value iteration (RLSVI), a model-free algorithm where… ▽ More

    Submitted 8 September, 2023; v1 submitted 1 November, 2019; originally announced November 2019.

    Comments: Minor bug fixes

  6. arXiv:1905.12185  [pdf, other

    cs.LG math.OC stat.ML

    Geometric Insights into the Convergence of Nonlinear TD Learning

    Authors: David Brandfonbrener, Joan Bruna

    Abstract: While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected learning dynam… ▽ More

    Submitted 11 February, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: ICLR 2020