Skip to main content

Showing 1–32 of 32 results for author: Dann, C

.
  1. arXiv:2406.07585  [pdf, other

    stat.ML cs.LG

    Rate-Preserving Reductions for Blackwell Approachability

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

    Abstract: Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation b… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2401.04056  [pdf, other

    cs.LG

    A Minimaximalist Approach to Reinforcement Learning from Human Feedback

    Authors: Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

    Abstract: We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training and is therefore rather simple to implement. Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust… ▽ More

    Submitted 13 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  3. arXiv:2306.02869  [pdf, other

    cs.LG cs.AI stat.ML

    Data-Driven Online Model Selection With Regret Guarantees

    Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile

    Abstract: We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner. Model selection is performed by regret balancing but, unlike the recent literature on this subject, we do not assume any prior… ▽ More

    Submitted 23 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

  4. arXiv:2302.09739  [pdf, ps, other

    cs.LG cs.AI stat.ML

    A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

    Authors: Christoph Dann, Chen-Yu Wei, Julian Zimmert

    Abstract: Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialised potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if t… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  5. arXiv:2302.09408  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Best of Both Worlds Policy Optimization

    Authors: Christoph Dann, Chen-Yu Wei, Julian Zimmert

    Abstract: Policy optimization methods are popular reinforcement learning algorithms in practice. Recent works have built theoretical foundation for them by proving $\sqrt{T}$ regret bounds even when the losses are adversarial. Such bounds are tight in the worst case but often overly pessimistic. In this work, we show that in tabular Markov decision processes (MDPs), by properly designing the regularizer, th… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

  6. Sentiment analysis and opinion mining on educational data: A survey

    Authors: Thanveer Shaik, Xiaohui Tao, Christopher Dann, Haoran Xie, Yan Li, Linda Galligan

    Abstract: Sentiment analysis AKA opinion mining is one of the most widely used NLP applications to identify human intentions from their reviews. In the education sector, opinion mining is used to listen to student opinions and enhance their learning-teaching practices pedagogically. With advancements in sentiment annotation techniques and AI methodologies, student comments can be labelled with their sentime… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

    Journal ref: Natural Language Processing Journal 2 (2023) 100003

  7. arXiv:2302.01517  [pdf, ps, other

    cs.LG

    Pseudonorm Approachability and Applications to Regret Minimization

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

    Abstract: Blackwell's celebrated approachability theory provides a general framework for a variety of learning problems, including regret minimization. However, Blackwell's proof and implicit algorithm measure approachability using the $\ell_2$ (Euclidean) distance. We argue that in many applications such as regret minimization, it is more useful to study approachability under other distance metrics, most c… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: To appear at ALT 2023

  8. arXiv:2301.13857  [pdf, other

    cs.LG cs.AI stat.ML

    Learning in POMDPs is Sample-Efficient with Hindsight Observability

    Authors: Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

    Abstract: POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling,… ▽ More

    Submitted 3 February, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  9. A Review of the Trends and Challenges in Adopting Natural Language Processing Methods for Education Feedback Analysis

    Authors: Thanveer Shaik, Xiaohui Tao, Yan Li, Christopher Dann, Jacquie Mcdonald, Petrea Redmond, Linda Galligan

    Abstract: Artificial Intelligence (AI) is a fast-growing area of study that stretching its presence to many business and research domains. Machine learning, deep learning, and natural language processing (NLP) are subsets of AI to tackle different areas of data processing and modelling. This review article presents an overview of AI impact on education outlining with current opportunities. In the education… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

    Journal ref: IEEE Access, vol. 10, pp. 56720-56739, 2022

  10. arXiv:2210.09255  [pdf, ps, other

    cs.LG stat.ML

    A Unified Algorithm for Stochastic Path Problems

    Authors: Christoph Dann, Chen-Yu Wei, Julian Zimmert

    Abstract: We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP)… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  11. arXiv:2208.10904  [pdf, ps, other

    cs.LG

    A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

    Authors: Christoph Dann, Mehryar Mohri, Tong Zhang, Julian Zimmert

    Abstract: Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applie… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Journal ref: Dann C, Mohri M, Zhang T, Zimmert J. A provably efficient model-free posterior sampling method for episodic reinforcement learning. Advances in Neural Information Processing Systems. 2021 Dec 6;34:12040-51

  12. arXiv:2206.14912  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Best of Both Worlds Model Selection

    Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile

    Abstract: We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 10 pages in main, 43 pages appendix

  13. arXiv:2206.09421  [pdf, other

    cs.LG

    Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

    Abstract: Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These cr… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: to appear at ICML 2022

  14. arXiv:2203.04274  [pdf, ps, other

    cs.LG cs.DS

    Leveraging Initial Hints for Free in Stochastic Linear Bandits

    Authors: Ashok Cutkosky, Chris Dann, Abhimanyu Das, Qiuyi, Zhang

    Abstract: We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action. We present a novel algorithm for stochastic linear bandits that uses this hint to improve its regret to $\tilde O(\sqrt{T})$ when the hint is accurate, while maintaining a minimax-optimal $\tilde O(d\sqrt{T})$ regret independent of th… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: ALT 2022

  15. arXiv:2202.10376  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE stat.AP

    Same Cause; Different Effects in the Brain

    Authors: Mariya Toneva, Jennifer Williams, Anand Bollu, Christoph Dann, Leila Wehbe

    Abstract: To study information processing in the brain, neuroscientists manipulate experimental stimuli while recording participant brain activity. They can then use encoding models to find out which brain "zone" (e.g. which region of interest, volume pixel or electrophysiology sensor) is predicted from the stimulus properties. Given the assumptions underlying this setup, when stimulus properties are predic… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: Accepted to CLeaR 2022

  16. arXiv:2110.03580  [pdf, ps, other

    cs.LG stat.ML

    A Model Selection Approach for Corruption Robust Reinforcement Learning

    Authors: Chen-Yu Wei, Christoph Dann, Julian Zimmert

    Abstract: We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm achieves a regret bound of $\widetilde{\mathcal{O}}(\min\{\frac{1}Δ, \sqrt{T}\}+C)$ where $T$ is the number of episodes, $C$ is the total amount of corruption, and… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  17. arXiv:2107.01264  [pdf, other

    cs.LG

    Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

    Authors: Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

    Abstract: We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy.… ▽ More

    Submitted 26 October, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

  18. arXiv:2106.11519  [pdf, other

    cs.LG cs.AI eess.SY

    Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

    Abstract: There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich obser… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  19. arXiv:2106.03243  [pdf, ps, other

    cs.LG

    Neural Active Learning with Performance Guarantees

    Authors: Pranjal Awasthi, Christoph Dann, Claudio Gentile, Ayush Sekhari, Zhilei Wang

    Abstract: We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: 30 pages

  20. arXiv:2012.13045  [pdf, ps, other

    cs.LG cs.AI stat.ML stat.OT

    Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

    Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett

    Abstract: We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regr… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: 57 pages

  21. arXiv:2005.03789  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning with Feedback Graphs

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

    Abstract: We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations. Such additional observations are available in a range of tasks through extended sensors or prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  22. arXiv:1911.01546  [pdf, other

    cs.LG cs.AI

    Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

    Authors: Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill

    Abstract: While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal p… ▽ More

    Submitted 2 April, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Journal ref: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

  23. arXiv:1811.03056  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Certificates: Towards Accountable Reinforcement Learning

    Authors: Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

    Abstract: The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These ce… ▽ More

    Submitted 27 May, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

    Comments: article appearing at ICML 2019; full version including appendix

  24. arXiv:1803.00606  [pdf, other

    cs.LG stat.ML

    On Oracle-Efficient PAC RL with Rich Observations

    Authors: Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and… ▽ More

    Submitted 16 January, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: appeared at NeurIPS 18; full paper including appendix; updated style file

  25. arXiv:1706.03100  [pdf, other

    cs.AI cs.LG stat.ML

    Decoupling Learning Rules from Representations

    Authors: Philip S. Thomas, Christoph Dann, Emma Brunskill

    Abstract: In the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

  26. arXiv:1703.07710  [pdf, other

    cs.LG cs.AI stat.ML

    Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

    Authors: Christoph Dann, Tor Lattimore, Emma Brunskill

    Abstract: Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version… ▽ More

    Submitted 2 January, 2018; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: appears in Neural Information Processing Systems 2017

  27. arXiv:1702.06238  [pdf, other

    cs.AI cs.LG

    Sample Efficient Policy Search for Optimal Stop** Domains

    Authors: Karan Goel, Christoph Dann, Emma Brunskill

    Abstract: Optimal stop** problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging prob… ▽ More

    Submitted 24 May, 2017; v1 submitted 20 February, 2017; originally announced February 2017.

    Comments: To appear in IJCAI-2017

  28. arXiv:1611.06928  [pdf, other

    cs.AI stat.ML

    Memory Lens: How Much Memory Does an Agent Use?

    Authors: Christoph Dann, Katja Hofmann, Sebastian Nowozin

    Abstract: We propose a new method to study the internal memory used by reinforcement learning policies. We estimate the amount of relevant past information by estimating mutual information between behavior histories and the current action of an agent. We perform this estimation in the passive setting, that is, we do not intervene but merely observe the natural behavior of the agent. Moreover, we provide a t… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  29. arXiv:1511.01870  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Thoughts on Massively Scalable Gaussian Processes

    Authors: Andrew Gordon Wilson, Christoph Dann, Hannes Nickisch

    Abstract: We introduce a framework and early results for massively scalable Gaussian processes (MSGP), significantly extending the KISS-GP approach of Wilson and Nickisch (2015). The MSGP framework enables the use of Gaussian processes (GPs) on billions of datapoints, without requiring distributed inference, or severe assumptions. In particular, MSGP reduces the standard $O(n^3)$ complexity of GP learning a… ▽ More

    Submitted 5 November, 2015; originally announced November 2015.

    Comments: 25 pages, 9 figures

  30. arXiv:1510.08906  [pdf, other

    stat.ML cs.AI cs.LG

    Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

    Authors: Christoph Dann, Emma Brunskill

    Abstract: Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such sc… ▽ More

    Submitted 11 May, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

    Comments: 28 pages, appeared in Neural Information Processing Systems (NIPS) 2015, updated version with fixed typos and modified Lemma 1 and Lemma C.5

  31. arXiv:1510.07389  [pdf, other

    cs.LG cs.AI stat.ML

    The Human Kernel

    Authors: Andrew Gordon Wilson, Christoph Dann, Christopher G. Lucas, Eric P. Xing

    Abstract: Bayesian nonparametric models, such as Gaussian processes, provide a compelling framework for automatic statistical modelling: these models have a high degree of flexibility, and automatically calibrated complexity. However, automating human expertise remains elusive; for example, Gaussian processes with standard kernels struggle on function extrapolation problems that are trivial for human learne… ▽ More

    Submitted 3 December, 2015; v1 submitted 26 October, 2015; originally announced October 2015.

    Comments: 11 pages, 5 figures. To appear in Neural Information Processing Systems (NIPS) 2015. Version 2: Figure 2 (i)-(n) now displays the second set of progressive function learning experiments

  32. arXiv:1507.06173  [pdf, other

    cs.CV

    Bayesian Time-of-Flight for Realtime Shape, Illumination and Albedo

    Authors: Amit Adam, Christoph Dann, Omer Yair, Shai Mazor, Sebastian Nowozin

    Abstract: We propose a computational model for shape, illumination and albedo inference in a pulsed time-of-flight (TOF) camera. In contrast to TOF cameras based on phase modulation, our camera enables general exposure profiles. This results in added flexibility and requires novel computational approaches. To address this challenge we propose a generative probabilistic model that accurately relates latent… ▽ More

    Submitted 22 July, 2015; originally announced July 2015.