Skip to main content

Showing 1–16 of 16 results for author: Foster, D P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.00684  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

    Authors: Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin

    Abstract: A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with fun… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 95 pages

  2. arXiv:2211.07419  [pdf, ps, other

    cs.LG

    Linear Reinforcement Learning with Ball Structure Action Space

    Authors: Zeyu Jia, Randy Jia, Dhruv Madeka, Dean P. Foster

    Abstract: We study the problem of Reinforcement Learning (RL) with linear function approximation, i.e. assuming the optimal action-value function is linear in a known $d$-dimensional feature map**. Unfortunately, however, based on only this assumption, the worst case sample complexity has been shown to be exponential, even under a generative model. Instead of making further assumptions on the MDP or value… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  3. arXiv:2210.07169  [pdf, ps, other

    econ.TH cs.GT cs.LG math.ST stat.ML

    Forecast Hedging and Calibration

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: Calibration means that forecasts and average realized frequencies are close. We develop the concept of forecast hedging, which consists of choosing the forecasts so as to guarantee that the expected track record can only improve. This yields all the calibration results by the same simple basic argument while differentiating between them by the forecast-hedging tools used: deterministic and fixed p… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-int

    Report number: HUJI DP-731

    Journal ref: Journal of Political Economy 129, 12 (December 2021), 3447-3490

  4. arXiv:2210.07152  [pdf, ps, other

    econ.TH cs.GT cs.LG math.ST stat.ML

    Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: We propose to smooth out the calibration score, which measures how good a forecaster is, by combining nearby forecasts. While regular calibration can be guaranteed only by randomized forecasting procedures, we show that smooth calibration can be guaranteed by deterministic procedures. As a consequence, it does not matter if the forecasts are leaked, i.e., made known in advance: smooth calibration… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-eq

    Report number: HUJI DP-692

    Journal ref: Games and Economic Behavior 109 (May 2018), 271-293

  5. arXiv:2210.03137  [pdf, other

    cs.LG math.OC

    Deep Inventory Management

    Authors: Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade

    Abstract: This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, our results show that several policy learning approaches are competitive with or outperform classical methods. In order to train… ▽ More

    Submitted 28 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

  6. arXiv:2209.04892  [pdf, ps, other

    econ.TH cs.GT cs.LG stat.ML

    "Calibeating": Beating Forecasters at Their Own Game

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one c… ▽ More

    Submitted 26 October, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-beat

  7. arXiv:2207.08342  [pdf, ps, other

    cs.LG

    A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

    Authors: Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster

    Abstract: The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  8. arXiv:2112.02165  [pdf, ps, other

    cs.LG

    On Submodular Contextual Bandits

    Authors: Dean P. Foster, Alexander Rakhlin

    Abstract: We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function that belongs to a class $\mathcal{F}$. We allow time-varying matroid constraints to be placed on the feasible sets. Assuming access to an online regression oracle with regret $\mathsf{Reg}(\mathcal{F})$, our algorithm efficiently randomizes… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  9. arXiv:2108.04552  [pdf, other

    cs.LG math.OC stat.ML

    The Benefits of Implicit Regularization from SGD in Least Squares Problems

    Authors: Difan Zou, **gfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make s… ▽ More

    Submitted 10 July, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 33 pages, 1 figure. In NeurIPS 2021

  10. arXiv:2105.06834  [pdf, other

    cs.LG stat.ME

    Threshold Martingales and the Evolution of Forecasts

    Authors: Dean P. Foster, Robert A. Stine

    Abstract: This paper introduces a martingale that characterizes two properties of evolving forecast distributions. Ideal forecasts of a future event behave as martingales, sequen- tially updating the forecast to leverage the available information as the future event approaches. The threshold martingale introduced here measures the proportion of the forecast distribution lying below a threshold. In addition… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  11. arXiv:2010.11895  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    What are the Statistical Limits of Offline RL with Linear Function Approximation?

    Authors: Ruosong Wang, Dean P. Foster, Sham M. Kakade

    Abstract: Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making p… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  12. arXiv:1811.08045  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Coupled Recurrent Models for Polyphonic Music Composition

    Authors: John Thickstun, Zaid Harchaoui, Dean P. Foster, Sham M. Kakade

    Abstract: This paper introduces a novel recurrent model for music composition that is tailored to the structure of polyphonic music. We propose an efficient new conditional probabilistic factorization of musical scores, viewing a score as a collection of concurrent, coupled sequences: i.e. voices. To model the conditional distributions, we borrow ideas from both convolutional and recurrent neural models; we… ▽ More

    Submitted 26 November, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

    Comments: 13 pages; long version of the paper appearing in ISMIR 2019

  13. arXiv:1209.5477  [pdf, other

    stat.ML cs.LG

    Optimal Weighting of Multi-View Data with Low Dimensional Hidden States

    Authors: Yichao Lu, Dean P. Foster

    Abstract: In Natural Language Processing (NLP) tasks, data often has the following two properties: First, data can be chopped into multi-views which has been successfully used for dimension reduction purposes. For example, in topic classification, every paper can be chopped into the title, the main text and the references. However, it is common that some of the views are less noisier than other views for su… ▽ More

    Submitted 26 September, 2012; v1 submitted 24 September, 2012; originally announced September 2012.

  14. arXiv:1204.6703  [pdf, ps, other

    cs.LG stat.ML

    A Spectral Algorithm for Latent Dirichlet Allocation

    Authors: Animashree Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Yi-Kai Liu

    Abstract: The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimat… ▽ More

    Submitted 17 January, 2013; v1 submitted 30 April, 2012; originally announced April 2012.

    Comments: Changed title to match conference version, which appears in Advances in Neural Information Processing Systems 25, 2012

  15. arXiv:1203.6130  [pdf, other

    stat.ML cs.LG

    Spectral dimensionality reduction for HMMs

    Authors: Dean P. Foster, Jordan Rodu, Lyle H. Ungar

    Abstract: Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend o… ▽ More

    Submitted 27 March, 2012; originally announced March 2012.

  16. arXiv:1107.1744  [pdf, other

    math.OC cs.LG eess.SY

    Stochastic convex optimization with bandit feedback

    Authors: Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin

    Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm'… ▽ More

    Submitted 8 October, 2011; v1 submitted 8 July, 2011; originally announced July 2011.