Search | arXiv e-print repository

Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality

Authors: Julien Grand-Clement, Marek Petrik, Nicolas Vieille

Abstract: Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discou… ▽ More Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that {\em approximate} Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games. △ Less

Submitted 7 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:1204.0323 [pdf, ps, other]

Dynamic Sender-Receiver Games

Authors: Jerome Renault, Eilon Solan, Nicolas Vieille

Abstract: We consider a dynamic version of sender-receiver games, where the sequence of states follows an irreducible Markov chain observed by the sender. Under mild assumptions, we provide a simple characterization of the limit set of equilibrium payoffs, as players become very patient. Under these assumptions, the limit set depends on the Markov chain only through its invariant measure. The (limit) equili… ▽ More We consider a dynamic version of sender-receiver games, where the sequence of states follows an irreducible Markov chain observed by the sender. Under mild assumptions, we provide a simple characterization of the limit set of equilibrium payoffs, as players become very patient. Under these assumptions, the limit set depends on the Markov chain only through its invariant measure. The (limit) equilibrium payoffs are the feasible payoffs that satisfy an individual rationality condition for the receiver, and an incentive compatibility condition for the sender. △ Less

Submitted 2 April, 2012; originally announced April 2012.

MSC Class: 60J10; 91A05; 91A10; 91A20

arXiv:1007.4427 [pdf, other]

Strategic Information Exchange

Authors: Dinah Rosenberg, Eilon Solan, Nicolas Vieille

Abstract: We study a class of two-player repeated games with incomplete information and informational externalities. In these games, two states are chosen at the outset, and players get private information on the pair, before engaging in repeated play. The payoff of each player only depends on his `own' state and on his own action. We study to what extent, and how, information can be exchanged in equilibriu… ▽ More We study a class of two-player repeated games with incomplete information and informational externalities. In these games, two states are chosen at the outset, and players get private information on the pair, before engaging in repeated play. The payoff of each player only depends on his `own' state and on his own action. We study to what extent, and how, information can be exchanged in equilibrium. We prove that provided the private information of each player is valuable for the other player, the set of sequential equilibrium payoffs converges to the set of feasible and individually rational payoffs as players become patient. △ Less

Submitted 26 July, 2010; originally announced July 2010.

Showing 1–3 of 3 results for author: Vieille, N