Showing 1–1 of 1 results for author: Pratticò, M

Search v0.5.6 released 2020-02-24

arXiv:2406.19861 [pdf, other]

cs.LG math.OC stat.ML

Operator World Models for Reinforcement Learning

Authors: Pietro Novelli, Marco Pratticò, Massimiliano Pontil, Carlo Ciliberto

Abstract: Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. We then lever… ▽ More Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. We then leverage the operatorial formulation of RL to express the action-value function in terms of this quantity in closed form via matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Search v0.5.6 released 2020-02-24