The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

Zamboni, Riccardo; Cirino, Duilio; Restelli, Marcello; Mutti, Mirco

Computer Science > Machine Learning

arXiv:2406.12795 (cs)

[Submitted on 18 Jun 2024]

Title:The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

Authors:Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

View PDF HTML (experimental)

Abstract:The problem of pure exploration in Markov decision processes has been cast as maximizing the entropy over the state distribution induced by the agent's policy, an objective that has been extensively studied. However, little attention has been dedicated to state entropy maximization under partial observability, despite the latter being ubiquitous in applications, e.g., finance and robotics, in which the agent only receives noisy observations of the true state governing the system's dynamics. How can we address state entropy maximization in those domains? In this paper, we study the simple approach of maximizing the entropy over observations in place of true latent states. First, we provide lower and upper bounds to the approximation of the true state entropy that only depends on some properties of the observation function. Then, we show how knowledge of the latter can be exploited to compute a principled regularization of the observation entropy to improve performance. With this work, we provide both a flexible approach to bring advances in state entropy maximization to the POMDP setting and a theoretical characterization of its intrinsic limits.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2406.12795 [cs.LG]
	(or arXiv:2406.12795v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.12795

Submission history

From: Riccardo Zamboni [view email]
[v1] Tue, 18 Jun 2024 17:00:13 UTC (1,632 KB)

Computer Science > Machine Learning

Title:The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators