Skip to main content

Showing 1–28 of 28 results for author: Foster, D J

Searching in archive math. Search in all archives.
.
  1. arXiv:2404.10122  [pdf, other

    stat.ML cs.LG math.ST

    Online Estimation via Offline Estimation: An Information-Theoretic Framework

    Authors: Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin

    Abstract: $… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2403.06571  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Online Exploration via Coverability

    Authors: Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

    Abstract: Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration. Within this framework, we introduce a new objective, $L_1$-Coverag… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  3. arXiv:2312.16730  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Foundations of Reinforcement Learning and Interactive Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin

    Abstract: These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making. We present a unifying framework for addressing the exploration-exploitation dilemma using frequentist and Bayesian approaches, with connections and parallels between supervised learning/estimation and decision making as an overarching theme. Special attention is paid to… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  4. arXiv:2310.11428  [pdf, other

    cs.LG math.OC stat.ML

    Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

    Authors: Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang

    Abstract: This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the behavior cloning loss. We empirically disentangle the statistical and computational causes of these oscillations, and find them to stem from the chao… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  5. arXiv:2307.03997  [pdf, other

    cs.LG math.OC

    Efficient Model-Free Exploration in Low-Rank MDPs

    Authors: Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin

    Abstract: A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with func… ▽ More

    Submitted 29 February, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

  6. arXiv:2301.08215  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

    Authors: Dylan J. Foster, Noah Golowich, Yanjun Han

    Abstract: A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  7. arXiv:2211.14250  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

    Authors: Dylan J. Foster, Noah Golowich, Jian Qian, Alexander Rakhlin, Ayush Sekhari

    Abstract: We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which ach… ▽ More

    Submitted 12 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: V2 changes: Improved writing and added more examples

  8. arXiv:2210.04157  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Role of Coverage in Online Reinforcement Learning

    Authors: Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

    Abstract: Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning. While such conditions might seem irrelevant to online reinforcement learning at first glance, we establish a new connection by showing -- somewhat surprisingly -- that the mere existence of a data… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  9. arXiv:2206.13063  [pdf, other

    cs.LG math.OC math.ST stat.ML

    On the Complexity of Adversarial Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

    Abstract: A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  10. arXiv:2112.13487  [pdf, other

    cs.LG math.OC math.ST stat.ML

    The Statistical Complexity of Interactive Decision Making

    Authors: Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

    Abstract: A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher… ▽ More

    Submitted 11 July, 2023; v1 submitted 26 December, 2021; originally announced December 2021.

    Comments: Minor improvements to writing and organization

  11. Minimax Rates for Conditional Density Estimation via Empirical Entropy

    Authors: Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

    Abstract: We consider the task of estimating a conditional density using i.i.d. samples from a joint distribution, which is a fundamental problem with applications in both classification and uncertainty quantification for regression. For joint density estimation, minimax rates have been characterized for general density classes in terms of uniform (metric) entropy, a well-studied notion of statistical capac… ▽ More

    Submitted 14 June, 2023; v1 submitted 21 September, 2021; originally announced September 2021.

    Comments: 59 pages, 1 figure. Minor edits to match published version

    Journal ref: Annals of Statistics, 51(2):762-790, 2023

  12. arXiv:2107.02237  [pdf, other

    cs.LG math.ST stat.ML

    Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

    Authors: Dylan J. Foster, Akshay Krishnamurthy

    Abstract: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees. While first-order guarantees are relatively well understood in statistical and online learning, adapting to low noise in contextua… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

  13. arXiv:2010.03799  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Learning the Linear Quadratic Regulator from Nonlinear Observations

    Authors: Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

    Abstract: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learn… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: To appear at NeurIPS 2020

  14. arXiv:2010.03104  [pdf, other

    cs.LG math.ST stat.ML

    Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

    Authors: Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

    Abstract: In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits ca… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  15. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  16. arXiv:2004.14681  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Learning nonlinear dynamical systems from a single trajectory

    Authors: Dylan J. Foster, Alexander Rakhlin, Tuhin Sarkar

    Abstract: We introduce algorithms for learning nonlinear dynamical systems of the form $x_{t+1}=σ(Θ^{\star}x_t)+\varepsilon_t$, where $Θ^{\star}$ is a weight matrix, $σ$ is a nonlinear link function, and $\varepsilon_t$ is a mean-zero noise process. We give an algorithm that recovers the weight matrix $Θ^{\star}$ from a single trajectory with optimal sample complexity and linear running time. The algorithm… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

    Comments: To appear at L4DC 2020

  17. arXiv:2003.00189  [pdf, ps, other

    cs.LG math.OC stat.ML

    Logarithmic Regret for Adversarial Online Control

    Authors: Dylan J. Foster, Max Simchowitz

    Abstract: We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances. Existing regret bounds for this setting scale as $\sqrt{T}$ unless strong stochastic assumptions are imposed on the disturbance process. We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are g… ▽ More

    Submitted 23 June, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: ICML 2020

  18. arXiv:2002.04926  [pdf, other

    cs.LG math.ST stat.ML

    Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

    Authors: Dylan J. Foster, Alexander Rakhlin

    Abstract: A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and op… ▽ More

    Submitted 23 June, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  19. arXiv:2001.09576  [pdf, other

    cs.LG math.OC stat.ML

    Naive Exploration is Optimal for Online LQR

    Authors: Max Simchowitz, Dylan J. Foster

    Abstract: We consider the problem of online adaptive control of the linear quadratic regulator, where the true system parameters are unknown. We prove new upper and lower bounds demonstrating that the optimal regret scales as $\widetildeΘ({\sqrt{d_{\mathbf{u}}^2 d_{\mathbf{x}} T}})$, where $T$ is the number of time steps, $d_{\mathbf{u}}$ is the dimension of the input space, and $d_{\mathbf{x}}$ is the dime… ▽ More

    Submitted 3 October, 2023; v1 submitted 26 January, 2020; originally announced January 2020.

  20. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  21. arXiv:1911.06468  [pdf, ps, other

    cs.LG math.ST stat.ML

    $\ell_{\infty}$ Vector Contraction for Rademacher Complexity

    Authors: Dylan J. Foster, Alexander Rakhlin

    Abstract: We show that the Rademacher complexity of any $\mathbb{R}^{K}$-valued function class composed with an $\ell_{\infty}$-Lipschitz function is bounded by the maximum Rademacher complexity of the restriction of the function class along each coordinate, times a factor of $\tilde{O}(\sqrt{K})$.

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Technical note

  22. arXiv:1906.00531  [pdf, other

    cs.LG math.ST stat.ML

    Model selection for contextual bandits

    Authors: Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

    Abstract: We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for linear contextual bandits. We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the… ▽ More

    Submitted 14 November, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

  23. arXiv:1905.13283  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    Sum-of-squares meets square loss: Fast rates for agnostic tensor completion

    Authors: Dylan J. Foster, Andrej Risteski

    Abstract: We study tensor completion in the agnostic setting. In the classical tensor completion problem, we receive $n$ entries of an unknown rank-$r$ tensor and wish to exactly complete the remaining entries. In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor. For agnostic learning of third-or… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: To appear at COLT 2019

  24. arXiv:1902.04686  [pdf, ps, other

    cs.LG math.OC stat.ML

    The Complexity of Making the Gradient Small in Stochastic Convex Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

    Abstract: We give nearly matching upper and lower bounds on the oracle complexity of finding $ε$-stationary points ($\| \nabla F(x) \| \leqε$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimizatio… ▽ More

    Submitted 14 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  25. arXiv:1901.09036  [pdf, other

    math.ST cs.LG econ.EM stat.ML

    Orthogonal Statistical Learning

    Authors: Dylan J. Foster, Vasilis Syrgkanis

    Abstract: We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data. We analyze a two-stage sample splitting meta-algorithm that takes as input arbitrary estimation algorithms for the target parameter and nuisance parameter. W… ▽ More

    Submitted 5 June, 2023; v1 submitted 24 January, 2019; originally announced January 2019.

    Comments: Reorganized, added experiments and additional examples

  26. arXiv:1810.11059  [pdf, ps, other

    cs.LG math.OC stat.ML

    Uniform Convergence of Gradients for Non-Convex Learning and Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization. Our analysis follows the tradition of norm-based capacity control. We propose vector-valued Rademacher complexities as a simple, composable, and user-f… ▽ More

    Submitted 11 November, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: To appear in Neural Information Processing Systems (NIPS) 2018

  27. arXiv:1803.07617  [pdf, other

    cs.LG math.OC stat.ML

    Online Learning: Sufficient Statistics and the Burkholder Method

    Authors: Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

    Abstract: We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is on… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

  28. arXiv:1704.04010  [pdf, other

    cs.LG math.OC stat.ML

    ZigZag: A new approach to adaptive online learning

    Authors: Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

    Abstract: We develop a novel family of algorithms for the online learning setting with regret against any data sequence bounded by the empirical Rademacher complexity of that sequence. To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces. When the hypothesis class is a set of l… ▽ More

    Submitted 13 April, 2017; originally announced April 2017.

    Comments: 49 pages