Skip to main content

Showing 1–3 of 3 results for author: Canonaco, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07826  [pdf, other

    cs.LG cs.AI

    On the Sample Efficiency of Abstractions and Potential-Based Reward Sha** in Reinforcement Learning

    Authors: Giuseppe Canonaco, Leo Ardon, Alberto Pozanco, Daniel Borrajo

    Abstract: The use of Potential Based Reward Sha** (PBRS) has shown great promise in the ongoing research effort to tackle sample inefficiency in Reinforcement Learning (RL). However, the choice of the potential function is critical for this technique to be effective. Additionally, RL techniques are usually constrained to use a finite horizon for computational limitations. This introduces a bias when using… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2005.12864  [pdf, ps, other

    cs.LG stat.ML

    Time-Variant Variational Transfer for Value Functions

    Authors: Giuseppe Canonaco, Andrea Soprani, Manuel Roveri, Marcello Restelli

    Abstract: In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. In the context of this work, we consider the problem of transferring value functions through a variational method when the distribution that generates the tasks is time-variant, pr… ▽ More

    Submitted 18 June, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  3. arXiv:1806.05618  [pdf, other

    cs.LG stat.ML

    Stochastic Variance-Reduced Policy Gradient

    Authors: Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

    Abstract: In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-con… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Journal ref: Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018