Skip to main content

Showing 1–11 of 11 results for author: Pike-Burke, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.01616  [pdf, other

    cs.LG cs.AI

    Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

    Authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini

    Abstract: We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interpl… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  2. arXiv:2307.00836  [pdf, other

    stat.ML cs.LG

    Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

    Authors: Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi

    Abstract: We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz "productivity" function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a w… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: ICML 2023

  3. arXiv:2302.11381  [pdf, other

    math.OC cs.LG math.ST

    Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

    Authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini

    Abstract: Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount fac… ▽ More

    Submitted 21 November, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted at NeurIPS 2023

  4. arXiv:2302.00392  [pdf, other

    stat.ML cs.AI cs.LG

    Delayed Feedback in Kernel Bandits

    Authors: Sattar Vakili, Danyal Ahmed, Alberto Bernacchia, Ciara Pike-Burke

    Abstract: Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existin… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  5. arXiv:2207.10786  [pdf, other

    cs.LG stat.ML

    Delayed Feedback in Generalised Linear Bandits Revisited

    Authors: Benjamin Howson, Ciara Pike-Burke, Sarah Filippi

    Abstract: The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised line… ▽ More

    Submitted 11 April, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  6. arXiv:2111.13026  [pdf, ps, other

    stat.ML cs.LG

    Bandit problems with fidelity rewards

    Authors: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle

    Abstract: The fidelity bandits problem is a variant of the $K$-armed bandit problem in which the reward of each arm is augmented by a fidelity reward that provides the player with an additional payoff depending on how 'loyal' the player has been to that arm in the past. We propose two models for fidelity. In the loyalty-points model the amount of extra reward depends on the number of times the arm has previ… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  7. arXiv:2111.07615  [pdf, other

    cs.LG

    Optimism and Delays in Episodic Reinforcement Learning

    Authors: Benjamin Howson, Ciara Pike-Burke, Sarah Filippi

    Abstract: There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updating the policy immediately after every interaction with the environment. However, feedback is almost always delayed in practi… ▽ More

    Submitted 6 April, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

  8. arXiv:2010.07778  [pdf, other

    cs.LG

    Local Differential Privacy for Regret Minimization in Reinforcement Learning

    Authors: Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

    Abstract: Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user sid… ▽ More

    Submitted 27 October, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

  9. arXiv:2007.01891  [pdf, ps, other

    cs.LG stat.ML

    A Unifying View of Optimism in Episodic Reinforcement Learning

    Authors: Gergely Neu, Ciara Pike-Burke

    Abstract: The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs a… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  10. arXiv:1910.14354  [pdf, other

    stat.ML cs.LG

    Recovering Bandits

    Authors: Ciara Pike-Burke, Steffen Grünewälder

    Abstract: We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played. While being a natural extension of the classical bandit problem that arises in many real-world settings, this variation is accompanied by significant difficulties. In particular, me… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: accepted to neurips 2019 (spotlight)

  11. arXiv:1709.06853  [pdf, other

    stat.ML cs.LG

    Bandits with Delayed, Aggregated Anonymous Feedback

    Authors: Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder

    Abstract: We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The… ▽ More

    Submitted 13 June, 2018; v1 submitted 20 September, 2017; originally announced September 2017.

    Comments: ICML 2018