Skip to main content

Showing 1–22 of 22 results for author: Ciosek, K

.
  1. arXiv:2404.02649  [pdf, other

    cs.LG

    On the Importance of Uncertainty in Decision-Making with Large Language Models

    Authors: Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

    Abstract: We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 12 pages of main content, 25 pages with references and appendix

  2. Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

    Authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai

    Abstract: Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions often result in misalignment between offline model objectives and online user satisfaction metrics. In this paper, we present a reinforcement learning… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: 10 pages. KDD 23

  3. Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

    Authors: Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

    Abstract: Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become a… ▽ More

    Submitted 20 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: Presented at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23)

  4. arXiv:2302.02788  [pdf, other

    cs.LG

    A Strong Baseline for Batch Imitation Learning

    Authors: Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

    Abstract: Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: 28 pages (10 main, 18 appendix), 4 figures

  5. arXiv:2108.04763  [pdf, other

    stat.ML cs.LG

    Imitation Learning by Reinforcement Learning

    Authors: Kamil Ciosek

    Abstract: Imitation learning algorithms learn a policy from demonstrations of expert behavior. We show that, for deterministic experts, imitation learning can be done by reduction to reinforcement learning with a stationary reward. Our theoretical analysis both certifies the recovery of expert reward and bounds the total variation distance between the expert and the imitation learner, showing a link to adve… ▽ More

    Submitted 15 March, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: Published in ICLR 2022

  6. arXiv:2102.12466  [pdf, other

    cs.LG

    Information Directed Reward Learning for Reinforcement Learning

    Authors: David Lindner, Matteo Turchetta, Sebastian Tschiatschek, Kamil Ciosek, Andreas Krause

    Abstract: For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithm… ▽ More

    Submitted 31 January, 2022; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Presented at Conference on Neural Information Processing Systems (NeurIPS), 2021

  7. arXiv:2101.09178  [pdf, other

    cs.MA cs.LG

    Estimating $α$-Rank by Maximizing Information Gain

    Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek

    Abstract: Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling. For example, meta-games that arise in multi-agent evaluation can only be accessed by running a succession of expensive experiments that may involve simultaneous deployment of several agents. In this paper, we focus on $α$-rank, a popular game-theoretic solution concept d… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

  8. arXiv:2101.07012  [pdf, other

    cs.LG stat.ML

    Regularized Policies are Reward Robust

    Authors: Hisham Husain, Kamil Ciosek, Ryota Tomioka

    Abstract: Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  9. arXiv:2101.05507  [pdf, other

    cs.LG cs.AI cs.HC cs.MA

    Evaluating the Robustness of Collaborative Agents

    Authors: Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

    Abstract: In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  10. arXiv:2101.03864  [pdf, other

    cs.LG cs.MA

    Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

    Authors: Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

    Abstract: Agents that interact with other agents often do not know a priori what the other agents' strategies are, but have to maximise their own online return while interacting with and learning about others. The optimal adaptive behaviour under uncertainty over the other agents' strategies w.r.t. some prior can in principle be computed using the Interactive Bayesian Reinforcement Learning framework. Unfor… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Published as an extended abstract at AAMAS 2021

  11. arXiv:2007.08220  [pdf, other

    cs.LG cs.AI stat.ML

    DRIFT: Deep Reinforcement Learning for Functional Software Testing

    Authors: Luke Harries, Rebekah Storan Clarke, Timothy Chapman, Swamy V. P. L. N. Nallamalli, Levent Ozgur, Shuktika Jain, Alex Leung, Steve Lim, Aaron Dietrich, José Miguel Hernández-Lobato, Tom Ellis, Cheng Zhang, Kamil Ciosek

    Abstract: Efficient software testing is essential for productive software development and reliable user experiences. As human testing is inefficient and expensive, automated software testing is needed. In this work, we propose a Reinforcement Learning (RL) framework for functional software testing named DRIFT. DRIFT operates on the symbolic representation of the user interface. It uses Q-learning through Ba… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  12. arXiv:2007.02040  [pdf, other

    cs.LG cs.AI stat.ML

    Discount Factor as a Regularizer in Reinforcement Learning

    Authors: Ron Amit, Ron Meir, Kamil Ciosek

    Abstract: Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For severa… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

    Comments: Published in ICML 2020

    Journal ref: Published in Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020

  13. arXiv:1910.12911  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

    Authors: Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann

    Abstract: The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: Published at Neurips 2019

  14. arXiv:1910.12807  [pdf, other

    stat.ML cs.LG

    Better Exploration with Optimistic Actor-Critic

    Authors: Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

    Abstract: Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in real-world domains is made difficult by their poor sample efficiency. We address this problem both theoretically and empirically. On the theoretical side, we ident… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 20 pages (including supplement)

    Journal ref: NeurIPS 2019

  15. arXiv:1909.11373  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-task Batch Reinforcement Learning with Metric Learning

    Authors: Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, Hao Su

    Abstract: We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, act… ▽ More

    Submitted 23 October, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: First two authors contributed equally

  16. arXiv:1802.06891  [pdf, other

    cs.LG cs.AI

    Fourier Policy Gradients

    Authors: Matthew Fellows, Kamil Ciosek, Shimon Whiteson

    Abstract: We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and… ▽ More

    Submitted 30 May, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

  17. arXiv:1801.03326  [pdf, other

    stat.ML cs.AI

    Expected Policy Gradients for Reinforcement Learning

    Authors: Kamil Ciosek, Shimon Whiteson

    Abstract: We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian… ▽ More

    Submitted 2 May, 2020; v1 submitted 10 January, 2018; originally announced January 2018.

    Comments: 36 pages, submitted for review to JMLR. This is an extended version of our paper in the AAAI-18 conference (arXiv:1706.05374)

    ACM Class: I.2.8; G.3

    Journal ref: Journal of Machine Learning Research, Vol. 21, (52):1-51, 2020

  18. arXiv:1706.05374  [pdf, other

    stat.ML cs.LG

    Expected Policy Gradients

    Authors: Kamil Ciosek, Shimon Whiteson

    Abstract: We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampled trajectory. We establish a new general policy gradient theorem, of which the stochastic and deter… ▽ More

    Submitted 13 April, 2018; v1 submitted 15 June, 2017; originally announced June 2017.

    Comments: Conference paper, AAAI-18, 12 pages including supplement

    MSC Class: 90C40 ACM Class: I.2.8; G.3

  19. arXiv:1605.07496  [pdf, other

    cs.LG cs.AI stat.ML

    Alternating Optimisation and Quadrature for Robust Control

    Authors: Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

    Abstract: Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable i… ▽ More

    Submitted 18 December, 2017; v1 submitted 24 May, 2016; originally announced May 2016.

    Comments: To appear in AAAI 2018. Video of policy learnt in simulation deployed on a real hexapod see https://youtu.be/ME90xtIPsKk

  20. arXiv:1501.03959  [pdf, other

    cs.AI cs.LG stat.ML

    Value Iteration with Options and State Aggregation

    Authors: Kamil Ciosek, David Silver

    Abstract: This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first coa… ▽ More

    Submitted 16 January, 2015; originally announced January 2015.

  21. arXiv:1301.5220  [pdf, ps, other

    stat.ML cs.LG

    Properties of the Least Squares Temporal Difference learning algorithm

    Authors: Kamil Ciosek

    Abstract: This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD ite… ▽ More

    Submitted 3 April, 2015; v1 submitted 22 January, 2013; originally announced January 2013.

  22. arXiv:1206.6473  [pdf

    cs.AI cs.LG

    Compositional Planning Using Optimal Option Models

    Authors: David Silver, Kamil Ciosek

    Abstract: In this paper we introduce a framework for option model composition. Option models are temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. Prior work has focused on constructing option models from primitive actions, by intra-option model learning; or on using option models to construct a value function, by inter-option planning.… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)