Skip to main content

Showing 1–31 of 31 results for author: Schapire, R E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09123  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Provable Interactive Learning with Hindsight Instruction Feedback

    Authors: Dipendra Misra, Aldo Pacchiano, Robert E. Schapire

    Abstract: We study interactive learning in a setting where the agent has to generate a response (e.g., an action or trajectory) given a context and an instruction. In contrast, to typical approaches that train the system using reward or expert supervision on response, we study learning with hindsight instruction where a teacher provides an instruction that is most suitable for the agent's generated response… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  2. arXiv:2205.14237  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Sample-Efficient RL with Side Information about Latent Dynamics

    Authors: Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire

    Abstract: We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinfor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 35 pages, 4 figures

  3. arXiv:2205.03260  [pdf, other

    math.OC cs.LG

    Convex Analysis at Infinity: An Introduction to Astral Space

    Authors: Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still e… ▽ More

    Submitted 11 January, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

  4. arXiv:2107.01509  [pdf, other

    cs.LG math.ST stat.ML

    Bayesian decision-making under misspecified priors with applications to meta-learning

    Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

    Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecifi… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  5. arXiv:2006.11226  [pdf, other

    cs.LG math.OC stat.ML

    Gradient descent follows the regularization path for general losses

    Authors: Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we s… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: To appear, COLT 2020

  6. arXiv:1803.01088  [pdf, other

    cs.LG stat.ML

    Practical Contextual Bandits with Regression Oracles

    Authors: Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

    Abstract: A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function clas… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

  7. arXiv:1803.00606  [pdf, other

    cs.LG stat.ML

    On Oracle-Efficient PAC RL with Rich Observations

    Authors: Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and… ▽ More

    Submitted 16 January, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: appeared at NeurIPS 18; full paper including appendix; updated style file

  8. arXiv:1612.06246  [pdf, ps, other

    cs.LG stat.ML

    Corralling a Band of Bandit Algorithms

    Authors: Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

    Abstract: We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback and it is thus critical that the master… ▽ More

    Submitted 5 June, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

    Comments: Accepted to COLT 2017

  9. arXiv:1611.01688  [pdf, other

    cs.LG cs.DS cs.GT

    Oracle-Efficient Online Learning and Auction Design

    Authors: Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

    Abstract: We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called Generalized Follow-the-Perturbed-Leader and provide conditions under which it is oracle-efficient while achieving vanishing regret. Our results make significant progress on an open problem raised b… ▽ More

    Submitted 5 August, 2019; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: An earlier version of this paper appeared in FOCS 2017

  10. arXiv:1610.09512  [pdf, other

    cs.LG stat.ML

    Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

    Authors: Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally… ▽ More

    Submitted 1 December, 2016; v1 submitted 29 October, 2016; originally announced October 2016.

    Comments: 42 pages, 1 figure

  11. arXiv:1606.00313  [pdf, ps, other

    cs.LG

    Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

    Authors: Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is th… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

  12. arXiv:1603.04119  [pdf, other

    cs.AI cs.LG stat.ML

    Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

    Authors: David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strateg… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.

  13. arXiv:1602.04889  [pdf, other

    cs.LG cs.AI

    Unsupervised Domain Adaptation Using Approximate Label Matching

    Authors: Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt

    Abstract: Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to lear… ▽ More

    Submitted 1 March, 2017; v1 submitted 15 February, 2016; originally announced February 2016.

  14. arXiv:1602.02454  [pdf, ps, other

    cs.LG

    Efficient Algorithms for Adversarial Contextual Learning

    Authors: Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner know… ▽ More

    Submitted 7 February, 2016; originally announced February 2016.

  15. arXiv:1507.00407  [pdf, other

    cs.GT cs.AI cs.LG

    Fast Convergence of Regularized Learning in Games

    Authors: Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

    Abstract: We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at $O(T^{-3/4})$, while the sum of utilities converges to an approximate optimum at… ▽ More

    Submitted 10 December, 2015; v1 submitted 1 July, 2015; originally announced July 2015.

  16. arXiv:1506.08669  [pdf, other

    cs.LG stat.ML

    Efficient and Parsimonious Agnostic Active Learning

    Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

    Abstract: We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise. 2) It is efficiently implementable with an ERM oracle. 3) It is more aggressive than all previous approaches satisfying 1 and 2. To do this we create an algorithm based on a n… ▽ More

    Submitted 7 January, 2016; v1 submitted 29 June, 2015; originally announced June 2015.

  17. arXiv:1502.06362  [pdf, other

    cs.LG

    Contextual Dueling Bandits

    Authors: Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

    Abstract: We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner's goal is to find the best policy, or way of behaving, in some space of policies, although "best"… ▽ More

    Submitted 13 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: 25 pages, 4 figures, Published at COLT 2015

  18. arXiv:1502.05934  [pdf, ps, other

    cs.LG

    Achieving All with No Parameters: Adaptive NormalHedge

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component of this work is an improved version of the NormalHedge.DT algorithm (Luo and Schapire, 2014), called AdaNormalHedge. On one hand, this new algorithm ensures sm… ▽ More

    Submitted 20 February, 2015; originally announced February 2015.

  19. arXiv:1406.1856  [pdf, ps, other

    cs.LG

    A Drifting-Games Analysis for Online Learning and Applications to Boosting

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We provide a general mechanism to design online learning algorithms based on a minimax analysis within a drifting-games framework. Different online learning settings (Hedge, multi-armed bandit problems and online convex optimization) are studied by converting into various kinds of drifting games. The original minimax analysis for drifting games is then used and generalized by applying a series of… ▽ More

    Submitted 30 October, 2014; v1 submitted 6 June, 2014; originally announced June 2014.

    Comments: In NIPS2014

  20. arXiv:1402.0555  [pdf, ps, other

    cs.LG stat.ML

    Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

    Authors: Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

    Abstract: We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only… ▽ More

    Submitted 13 October, 2014; v1 submitted 3 February, 2014; originally announced February 2014.

  21. arXiv:1307.8187  [pdf, ps, other

    cs.LG

    Towards Minimax Online Learning with Unknown Time Horizon

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we de… ▽ More

    Submitted 6 October, 2013; v1 submitted 30 July, 2013; originally announced July 2013.

  22. arXiv:1301.0599  [pdf

    cs.LG stat.ML

    Advances in Boosting (Invited Talk)

    Authors: Robert E. Schapire

    Abstract: Boosting is a general method of generating many simple classification rules and combining them into a single, highly accurate rule. In this talk, I will review the AdaBoost boosting algorithm and some of its underlying theory, and then look at how this theory has helped us to face some of the challenges of applying AdaBoost in two domains: In the first of these, we used boosting for predicting and… ▽ More

    Submitted 12 December, 2012; originally announced January 2013.

    Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

    Report number: UAI-P-2002-PG-446-452

  23. arXiv:1206.5290  [pdf

    cs.LG cs.AI stat.ML

    Imitation Learning with a Value-Based Prior

    Authors: Umar Syed, Robert E. Schapire

    Abstract: The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-384-391

  24. arXiv:1203.3486  [pdf

    cs.LG stat.ML

    Combining Spatial and Telemetric Features for Learning Animal Movement Models

    Authors: Berk Kapicioglu, Robert E. Schapire, Martin Wikelski, Tamara Broderick

    Abstract: We introduce a new graphical model for tracking radio-tagged animals and learning their movement patterns. The model provides a principled way to combine radio telemetry data with an arbitrary set of userdefined, spatial features. We describe an efficient stochastic gradient algorithm for fitting model parameters to data and demonstrate its effectiveness via asymptotic analysis and synthetic exper… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-260-267

  25. arXiv:1202.1334  [pdf, ps, other

    cs.LG

    Contextual Bandit Learning with Predictable Rewards

    Authors: Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, Robert E. Schapire

    Abstract: Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under t… ▽ More

    Submitted 2 March, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

  26. arXiv:1108.2989  [pdf, other

    stat.ML cs.AI

    A theory of multiclass boosting

    Authors: Indraneel Mukherjee, Robert E. Schapire

    Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the "correct" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requireme… ▽ More

    Submitted 15 August, 2011; originally announced August 2011.

    Comments: A preliminary version appeared in NIPS 2010

    ACM Class: I.2.6

  27. arXiv:1106.6024  [pdf, ps, other

    math.OC cs.AI stat.ML

    The Rate of Convergence of AdaBoost

    Authors: Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

    Abstract: The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponenti… ▽ More

    Submitted 29 June, 2011; originally announced June 2011.

    Comments: A preliminary version will appear in COLT 2011

  28. Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions

    Authors: J. A. Csirik, M. L. Littman, D. McAllester, R. E. Schapire, P. Stone

    Abstract: Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to… ▽ More

    Submitted 26 June, 2011; originally announced June 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 19, pages 209-242, 2003

  29. arXiv:1105.5464  [pdf, ps

    cs.LG cs.AI

    Learning to Order Things

    Authors: W. W. Cohen, R. E. Schapire, Y. Singer

    Abstract: There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary p… ▽ More

    Submitted 26 May, 2011; originally announced May 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 10, pages 243-270, 1999

  30. arXiv:1003.0146  [pdf, ps, other

    cs.LG cs.AI cs.IR

    A Contextual-Bandit Approach to Personalized News Article Recommendation

    Authors: Lihong Li, Wei Chu, John Langford, Robert E. Schapire

    Abstract: Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. S… ▽ More

    Submitted 1 March, 2012; v1 submitted 27 February, 2010; originally announced March 2010.

    Comments: 10 pages, 5 figures

    ACM Class: H.3.5; I.2.6

    Journal ref: Presented at the Nineteenth International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, 2010

  31. arXiv:1002.4058  [pdf, ps, other

    cs.LG

    Contextual Bandit Algorithms with Supervised Learning Guarantees

    Authors: Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire

    Abstract: We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most… ▽ More

    Submitted 27 October, 2011; v1 submitted 22 February, 2010; originally announced February 2010.

    Comments: 10 pages

    ACM Class: I.2.6