Skip to main content

Showing 1–8 of 8 results for author: Parisi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13909  [pdf, other

    cs.LG

    Beyond Optimism: Exploration With Partially Observable Rewards

    Authors: Simone Parisi, Alireza Kazemipour, Michael Bowling

    Abstract: Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration and reward discovery, popular algorithms rely on optimism. But what if sometimes rewards are unobservable, e.g., situations of partial monitoring in bandits and… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2402.06819  [pdf, other

    cs.LG

    Monitored Markov Decision Processes

    Authors: Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling

    Abstract: In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There… ▽ More

    Submitted 13 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: AAMAS 2024, Main Track

  3. arXiv:2203.03580  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

    Authors: Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta

    Abstract: Recent years have seen the emergence of pre-trained representations as a powerful abstraction for AI applications in computer vision, natural language, and speech. However, policy learning for control is still dominated by a tabula-rasa learning paradigm, with visuo-motor policies often trained from scratch using data from deployment environments. In this context, we revisit and study the role of… ▽ More

    Submitted 8 August, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: First two authors contributed equally

    Journal ref: International Conference on Machine Learning (ICML), 2022, 162:17359-17371

  4. arXiv:2111.13119  [pdf, other

    cs.LG

    Interesting Object, Curious Agent: Learning Task-Agnostic Exploration

    Authors: Simone Parisi, Victoria Dean, Deepak Pathak, Abhinav Gupta

    Abstract: Common approaches for task-agnostic exploration learn tabula-rasa --the agent assumes isolated environments and no prior knowledge or experience. However, in the real world, agents learn in many environments and always come with prior experiences as they explore new ones. Exploration is a lifelong process. In this paper, we propose a paradigm change in the formulation and evaluation of task-agnost… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted at NeurIPS 2021

  5. arXiv:2001.00119  [pdf, other

    cs.LG stat.ML

    Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

    Authors: Simone Parisi, Davide Tateo, Maximilian Hensel, Carlo D'Eramo, Jan Peters, Joni Pajarinen

    Abstract: Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic rewards to train the agent, and in situations where this occurs very rarely the agent learns slowly or cannot learn at all. Similarly, if the agent receives also rewards that create suboptimal modes of the objective function, it will likely prematurely stop exploring. More… ▽ More

    Submitted 3 March, 2022; v1 submitted 31 December, 2019; originally announced January 2020.

    Journal ref: Algorithms 2022, 15(3), 81

  6. TD-Regularized Actor-Critic Methods

    Authors: Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan

    Abstract: Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objec… ▽ More

    Submitted 25 February, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

  7. arXiv:1611.03231  [pdf, ps, other

    stat.ML cs.LG

    Policy Search with High-Dimensional Context Variables

    Authors: Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

    Abstract: Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such a… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

  8. arXiv:1406.3497  [pdf, other

    cs.AI cs.LG

    Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material

    Authors: Matteo Pirotta, Simone Parisi, Marcello Restelli

    Abstract: This document contains supplementary material for the paper "Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation", published at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15). The paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs). We propose a policy-based approach t… ▽ More

    Submitted 18 November, 2014; v1 submitted 13 June, 2014; originally announced June 2014.

    Comments: AAAI-15 Supplement. Updated upon acceptance at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)