Skip to main content

Showing 1–40 of 40 results for author: Schapire, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09123  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Provable Interactive Learning with Hindsight Instruction Feedback

    Authors: Dipendra Misra, Aldo Pacchiano, Robert E. Schapire

    Abstract: We study interactive learning in a setting where the agent has to generate a response (e.g., an action or trajectory) given a context and an instruction. In contrast, to typical approaches that train the system using reward or expert supervision on response, we study learning with hindsight instruction where a teacher provides an instruction that is most suitable for the agent's generated response… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  2. arXiv:2306.06184  [pdf, other

    cs.LG stat.ML

    A Unified Model and Dimension for Interactive Estimation

    Authors: Nataly Brukhim, Miroslav Dudik, Aldo Pacchiano, Robert Schapire

    Abstract: We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity'' to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which largely captures learnability in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret a… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  3. arXiv:2205.14237  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Sample-Efficient RL with Side Information about Latent Dynamics

    Authors: Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire

    Abstract: We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinfor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 35 pages, 4 figures

  4. arXiv:2205.03260  [pdf, other

    math.OC cs.LG

    Convex Analysis at Infinity: An Introduction to Astral Space

    Authors: Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still e… ▽ More

    Submitted 11 January, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

  5. arXiv:2107.01509  [pdf, other

    cs.LG math.ST stat.ML

    Bayesian decision-making under misspecified priors with applications to meta-learning

    Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

    Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecifi… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  6. arXiv:2102.07024  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Learning from Activity Description

    Authors: Khanh Nguyen, Dipendra Misra, Robert Schapire, Miro Dudík, Patrick Shafto

    Abstract: We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Unlike imitation learning (IL), our protocol allows the teaching agent to provide feedback in a language that is most appropriate for them. Compared with reward in reinforcement learning (RL), the description feedback is richer and allows for improved sample com… ▽ More

    Submitted 14 June, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  7. arXiv:2006.11226  [pdf, other

    cs.LG math.OC stat.ML

    Gradient descent follows the regularization path for general losses

    Authors: Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we s… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: To appear, COLT 2020

  8. arXiv:2002.11650  [pdf, other

    cs.LG cs.DS cs.GT econ.GN stat.ML

    Contextual Search in the Presence of Adversarial Corruptions

    Authors: Akshay Krishnamurthy, Thodoris Lykouris, Chara Podimata, Robert Schapire

    Abstract: We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard formulations of this problem assume that agents act in accordance with a specific homogeneous response model. In practice, however, some responses may be adversarially corrupted. Existing algorithms heavily depend on the assumed response model… ▽ More

    Submitted 6 August, 2022; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: The first version was titled "Corrupted multidimensional binary search: Learning in the presence of irrational agents". An 8-page extended abstract titled "Contextual search in the presence of irrational agents" appeared at the 53rd ACM Symposium on the Theory of Computing (STOC '21)

  9. arXiv:1906.09323  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Reinforcement Learning with Convex Constraints

    Authors: Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

    Abstract: In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we… ▽ More

    Submitted 11 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems 32 (2019), 14093-14102

  10. arXiv:1811.11881  [pdf, other

    cs.DS cs.LG stat.ML

    Adversarial Bandits with Knapsacks

    Authors: Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

    Abstract: We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions… ▽ More

    Submitted 6 March, 2023; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: The extended abstract appeared in FOCS 2019. The definitive version was published in JACM '22. V8 is the latest version with all technical changes. Subsequent versions fixes minor LATEX presentation issues

  11. arXiv:1803.01088  [pdf, other

    cs.LG stat.ML

    Practical Contextual Bandits with Regression Oracles

    Authors: Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

    Abstract: A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function clas… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

  12. arXiv:1803.00606  [pdf, other

    cs.LG stat.ML

    On Oracle-Efficient PAC RL with Rich Observations

    Authors: Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and… ▽ More

    Submitted 16 January, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: appeared at NeurIPS 18; full paper including appendix; updated style file

  13. arXiv:1706.04964  [pdf, ps, other

    cs.LG

    Learning Deep ResNet Blocks Sequentially using Boosting Theory

    Authors: Furong Huang, Jordan Ash, John Langford, Robert Schapire

    Abstract: Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct $T$ weak module classifiers, each contains two of the $T$ layers, such that the combined strong learner is a ResNet. Th… ▽ More

    Submitted 14 June, 2018; v1 submitted 15 June, 2017; originally announced June 2017.

    Comments: Accepted to ICML 2018

  14. arXiv:1612.06246  [pdf, ps, other

    cs.LG stat.ML

    Corralling a Band of Bandit Algorithms

    Authors: Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

    Abstract: We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback and it is thus critical that the master… ▽ More

    Submitted 5 June, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

    Comments: Accepted to COLT 2017

  15. arXiv:1611.01688  [pdf, other

    cs.LG cs.DS cs.GT

    Oracle-Efficient Online Learning and Auction Design

    Authors: Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

    Abstract: We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle. We present an algorithm called Generalized Follow-the-Perturbed-Leader and provide conditions under which it is oracle-efficient while achieving vanishing regret. Our results make significant progress on an open problem raised b… ▽ More

    Submitted 5 August, 2019; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: An earlier version of this paper appeared in FOCS 2017

  16. arXiv:1610.09512  [pdf, other

    cs.LG stat.ML

    Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

    Authors: Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally… ▽ More

    Submitted 1 December, 2016; v1 submitted 29 October, 2016; originally announced October 2016.

    Comments: 42 pages, 1 figure

  17. arXiv:1606.00313  [pdf, ps, other

    cs.LG

    Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

    Authors: Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is th… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

  18. arXiv:1603.04119  [pdf, other

    cs.AI cs.LG stat.ML

    Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

    Authors: David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strateg… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.

  19. arXiv:1602.04889  [pdf, other

    cs.LG cs.AI

    Unsupervised Domain Adaptation Using Approximate Label Matching

    Authors: Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt

    Abstract: Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to lear… ▽ More

    Submitted 1 March, 2017; v1 submitted 15 February, 2016; originally announced February 2016.

  20. arXiv:1602.02454  [pdf, ps, other

    cs.LG

    Efficient Algorithms for Adversarial Contextual Learning

    Authors: Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire

    Abstract: We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner know… ▽ More

    Submitted 7 February, 2016; originally announced February 2016.

  21. arXiv:1510.02558  [pdf, other

    stat.ML cs.LG

    Functional Frank-Wolfe Boosting for General Loss Functions

    Authors: Chu Wang, Yingfei Wang, Weinan E, Robert Schapire

    Abstract: Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost… ▽ More

    Submitted 8 October, 2015; originally announced October 2015.

  22. arXiv:1507.00407  [pdf, other

    cs.GT cs.AI cs.LG

    Fast Convergence of Regularized Learning in Games

    Authors: Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

    Abstract: We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games. When each player in a game uses an algorithm from our class, their individual regret decays at $O(T^{-3/4})$, while the sum of utilities converges to an approximate optimum at… ▽ More

    Submitted 10 December, 2015; v1 submitted 1 July, 2015; originally announced July 2015.

  23. arXiv:1506.08669  [pdf, other

    cs.LG stat.ML

    Efficient and Parsimonious Agnostic Active Learning

    Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

    Abstract: We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise. 2) It is efficiently implementable with an ERM oracle. 3) It is more aggressive than all previous approaches satisfying 1 and 2. To do this we create an algorithm based on a n… ▽ More

    Submitted 7 January, 2016; v1 submitted 29 June, 2015; originally announced June 2015.

  24. arXiv:1506.04513  [pdf, other

    cs.LG stat.ML

    Convex Risk Minimization and Conditional Probability Estimation

    Authors: Matus Telgarsky, Miroslav Dudík, Robert Schapire

    Abstract: This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any se… ▽ More

    Submitted 15 June, 2015; originally announced June 2015.

    Comments: To appear, COLT 2015

  25. arXiv:1502.06362  [pdf, other

    cs.LG

    Contextual Dueling Bandits

    Authors: Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

    Abstract: We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner's goal is to find the best policy, or way of behaving, in some space of policies, although "best"… ▽ More

    Submitted 13 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: 25 pages, 4 figures, Published at COLT 2015

  26. arXiv:1502.05934  [pdf, ps, other

    cs.LG

    Achieving All with No Parameters: Adaptive NormalHedge

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component of this work is an improved version of the NormalHedge.DT algorithm (Luo and Schapire, 2014), called AdaNormalHedge. On one hand, this new algorithm ensures sm… ▽ More

    Submitted 20 February, 2015; originally announced February 2015.

  27. arXiv:1406.1856  [pdf, ps, other

    cs.LG

    A Drifting-Games Analysis for Online Learning and Applications to Boosting

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We provide a general mechanism to design online learning algorithms based on a minimax analysis within a drifting-games framework. Different online learning settings (Hedge, multi-armed bandit problems and online convex optimization) are studied by converting into various kinds of drifting games. The original minimax analysis for drifting games is then used and generalized by applying a series of… ▽ More

    Submitted 30 October, 2014; v1 submitted 6 June, 2014; originally announced June 2014.

    Comments: In NIPS2014

  28. arXiv:1402.0555  [pdf, ps, other

    cs.LG stat.ML

    Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

    Authors: Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

    Abstract: We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only… ▽ More

    Submitted 13 October, 2014; v1 submitted 3 February, 2014; originally announced February 2014.

  29. arXiv:1307.8187  [pdf, ps, other

    cs.LG

    Towards Minimax Online Learning with Unknown Time Horizon

    Authors: Haipeng Luo, Robert E. Schapire

    Abstract: We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we de… ▽ More

    Submitted 6 October, 2013; v1 submitted 30 July, 2013; originally announced July 2013.

  30. arXiv:1301.0599  [pdf

    cs.LG stat.ML

    Advances in Boosting (Invited Talk)

    Authors: Robert E. Schapire

    Abstract: Boosting is a general method of generating many simple classification rules and combining them into a single, highly accurate rule. In this talk, I will review the AdaBoost boosting algorithm and some of its underlying theory, and then look at how this theory has helped us to face some of the challenges of applying AdaBoost in two domains: In the first of these, we used boosting for predicting and… ▽ More

    Submitted 12 December, 2012; originally announced January 2013.

    Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

    Report number: UAI-P-2002-PG-446-452

  31. arXiv:1206.5290  [pdf

    cs.LG cs.AI stat.ML

    Imitation Learning with a Value-Based Prior

    Authors: Umar Syed, Robert E. Schapire

    Abstract: The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-384-391

  32. arXiv:1203.3486  [pdf

    cs.LG stat.ML

    Combining Spatial and Telemetric Features for Learning Animal Movement Models

    Authors: Berk Kapicioglu, Robert E. Schapire, Martin Wikelski, Tamara Broderick

    Abstract: We introduce a new graphical model for tracking radio-tagged animals and learning their movement patterns. The model provides a principled way to combine radio telemetry data with an arbitrary set of userdefined, spatial features. We describe an efficient stochastic gradient algorithm for fitting model parameters to data and demonstrate its effectiveness via asymptotic analysis and synthetic exper… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-260-267

  33. arXiv:1202.1334  [pdf, ps, other

    cs.LG

    Contextual Bandit Learning with Predictable Rewards

    Authors: Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, Robert E. Schapire

    Abstract: Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under t… ▽ More

    Submitted 2 March, 2012; v1 submitted 6 February, 2012; originally announced February 2012.

  34. arXiv:1108.2989  [pdf, other

    stat.ML cs.AI

    A theory of multiclass boosting

    Authors: Indraneel Mukherjee, Robert E. Schapire

    Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the "correct" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requireme… ▽ More

    Submitted 15 August, 2011; originally announced August 2011.

    Comments: A preliminary version appeared in NIPS 2010

    ACM Class: I.2.6

  35. arXiv:1106.6024  [pdf, ps, other

    math.OC cs.AI stat.ML

    The Rate of Convergence of AdaBoost

    Authors: Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

    Abstract: The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponenti… ▽ More

    Submitted 29 June, 2011; originally announced June 2011.

    Comments: A preliminary version will appear in COLT 2011

  36. Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions

    Authors: J. A. Csirik, M. L. Littman, D. McAllester, R. E. Schapire, P. Stone

    Abstract: Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to… ▽ More

    Submitted 26 June, 2011; originally announced June 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 19, pages 209-242, 2003

  37. arXiv:1105.5464  [pdf, ps

    cs.LG cs.AI

    Learning to Order Things

    Authors: W. W. Cohen, R. E. Schapire, Y. Singer

    Abstract: There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary p… ▽ More

    Submitted 26 May, 2011; originally announced May 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 10, pages 243-270, 1999

  38. arXiv:1003.0146  [pdf, ps, other

    cs.LG cs.AI cs.IR

    A Contextual-Bandit Approach to Personalized News Article Recommendation

    Authors: Lihong Li, Wei Chu, John Langford, Robert E. Schapire

    Abstract: Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. S… ▽ More

    Submitted 1 March, 2012; v1 submitted 27 February, 2010; originally announced March 2010.

    Comments: 10 pages, 5 figures

    ACM Class: H.3.5; I.2.6

    Journal ref: Presented at the Nineteenth International Conference on World Wide Web (WWW 2010), Raleigh, NC, USA, 2010

  39. arXiv:1002.4058  [pdf, ps, other

    cs.LG

    Contextual Bandit Algorithms with Supervised Learning Guarantees

    Authors: Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire

    Abstract: We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most… ▽ More

    Submitted 27 October, 2011; v1 submitted 22 February, 2010; originally announced February 2010.

    Comments: 10 pages

    ACM Class: I.2.6

  40. arXiv:cs/0506101  [pdf, ps, other

    cs.LG cs.CL

    Efficient Multiclass Implementations of L1-Regularized Maximum Entropy

    Authors: Patrick Haffner, Steven Phillips, Rob Schapire

    Abstract: This paper discusses the application of L1-regularized maximum entropy modeling or SL1-Max [9] to multiclass categorization problems. A new modification to the SL1-Max fast sequential learning algorithm is proposed to handle conditional distributions. Furthermore, unlike most previous studies, the present research goes beyond a single type of conditional distribution. It describes and compares a… ▽ More

    Submitted 29 June, 2005; originally announced June 2005.

    Comments: 13 pages, describes new conditional maxent algorithm, to be submitted