-
Stochastic homogenization of HJ equations: a differential game approach
Authors:
Andrea Davini,
Raimundo Saona,
Bruno Ziliotto
Abstract:
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Ergodic Unobservable MDPs: Decidability of Approximation
Authors:
Krishnendu Chatterjee,
David Lurie,
Raimundo Saona,
Bruno Ziliotto
Abstract:
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on…
▽ More
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on matrix product theory and ergodic properties, we introduce a novel subclass of UMDPs, termed ergodic UMDPs. Our main result demonstrates that approximating the value within this subclass is decidable. However, we show that the exact problem remains undecidable. Finally, we discuss the primary challenges of extending these results to partially observable Markov decision processes.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Bayesian Learning in Mean Field Games
Authors:
Eran Shmaya,
Bruno Ziliotto
Abstract:
We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a stream of private signals they receive throughout the game. We derive a mean field system satisfied by the equilibrium payoff of the game and prove existence of a solution under standard regularity assumptions. Addition…
▽ More
We consider a mean-field game model where the cost functions depend on a fixed parameter, called \textit{state}, which is unknown to players. Players learn about the state from a a stream of private signals they receive throughout the game. We derive a mean field system satisfied by the equilibrium payoff of the game and prove existence of a solution under standard regularity assumptions. Additionally, we establish the uniqueness of the solution when the cost function satisfies the monotonicity assumption of Lasry and Lions at each state.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Zero-sum Random Games on Directed Graphs
Authors:
Luc Attia,
Lyuben Lichev,
Dieter Mitsche,
Raimundo Saona,
Bruno Ziliotto
Abstract:
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players.
Starting from a fixed vertex, players take turns to move a token along the edges of the graph.
On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game conve…
▽ More
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players.
Starting from a fixed vertex, players take turns to move a token along the edges of the graph.
On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game converges almost surely to a constant at an exponential rate dominated in terms of the expansion.
On the other hand, for the infinite $d$-ary tree that does not fall into the previous class of graphs, we show convergence at a double-exponential rate in terms of the expansion.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Prophet Inequalities Require Only a Constant Number of Samples
Authors:
Andrés Cristi,
Bruno Ziliotto
Abstract:
In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stop** rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to b…
▽ More
In a prophet inequality problem, $n$ independent random variables are presented to a gambler one by one. The gambler decides when to stop the sequence and obtains the most recent value as reward. We evaluate a stop** rule by the worst-case ratio between its expected reward and the expectation of the maximum variable. In the classic setting, the order is fixed, and the optimal ratio is known to be 1/2. Three variants of this problem have been extensively studied: the prophet-secretary model, where variables arrive in uniformly random order; the free-order model, where the gambler chooses the arrival order; and the i.i.d. model, where the distributions are all the same, rendering the arrival order irrelevant.
Most of the literature assumes that distributions are known to the gambler. Recent work has considered the question of what is achievable when the gambler has access only to a few samples per distribution. Surprisingly, in the fixed-order case, a single sample from each distribution is enough to approximate the optimal ratio, but this is not the case in any of the three variants.
We provide a unified proof that for all three variants of the problem, a constant number of samples (independent of n) for each distribution is good enough to approximate the optimal ratios.
Prior to our work, this was known to be the case only in the i.i.d. variant. We complement our result showing that our algorithms can be implemented in polynomial time.
A key ingredient in our proof is an existential result based on a minimax argument, which states that there must exist an algorithm that attains the optimal ratio and does not rely on the knowledge of the upper tail of the distributions. A second key ingredient is a refined sample-based version of a decomposition of the instance into "small" and "large" variables, first introduced by Liu et al. [EC'21].
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Blackwell's Approachability with Time-Dependent Outcome Functions and Dot Products. Application to the Big Match
Authors:
Joon Kwon,
Bruno Ziliotto
Abstract:
Blackwell's approachability is a very general sequential decision framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given "target" set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's al…
▽ More
Blackwell's approachability is a very general sequential decision framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given "target" set. Blackwell gave a sufficient condition for the decision maker having a strategy guaranteeing such a convergence against an adversarial environment, as well as what we now call the Blackwell's algorithm, which then ensures convergence. Blackwell's approachability has since been applied to numerous problems, in online learning and game theory, in particular. We extend this framework by allowing the outcome function and the dot product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent dot products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to the Big Match (one of the most important toy examples of stochastic games) where an $ε$-uniformly optimal strategy for Player I is given by Blackwell's algorithm in a well-chosen auxiliary approachability problem.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Percolation games
Authors:
Guillaume Garnier,
Bruno Ziliotto
Abstract:
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
This paper introduces a discrete-time stochastic game class on $\mathbb{Z}^d$, which plays the role of a toy model for the well-known problem of stochastic homogenization of Hamilton-Jacobi equations. Conditions are provided under which the $n$-stage game value converges as $n$ tends to infinity, and connections with homogenization theory is discussed.
△ Less
Submitted 17 December, 2021; v1 submitted 23 October, 2021;
originally announced October 2021.
-
Mertens conjectures in absorbing games with incomplete information
Authors:
Bruno Ziliotto
Abstract:
In a zero-sum stochastic game with signals, at each stage, two adversary players take decisions and receive a stage payoff determined by these decisions and a variable called state. The state follows a Markov chain, that is controlled by both players. Actions and states are imperfectly observed by players, who receive a private signal at each stage. Mertens (ICM 1986) conjectured two properties re…
▽ More
In a zero-sum stochastic game with signals, at each stage, two adversary players take decisions and receive a stage payoff determined by these decisions and a variable called state. The state follows a Markov chain, that is controlled by both players. Actions and states are imperfectly observed by players, who receive a private signal at each stage. Mertens (ICM 1986) conjectured two properties regarding games with long duration: first, that limit value always exists, second, that when Player 1 is more informed than Player 2, she can guarantee uniformly the limit value. These conjectures were disproved recently by the author, but remain widely open in many subclasses. A well-known particular subclass is the one of absorbing games with incomplete information on both sides, in which the state can move at most once during the game, and players get a private signal about it at the outset of the game. This paper proves Mertens conjectures in this particular model, by introducing a new approximation technique of belief dynamics, that is likely to generalize to many other frameworks. In particular, this makes a significant step towards the understanding of the following broad question: in which games do Mertens conjectures hold?
△ Less
Submitted 1 December, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Unknown I.I.D. Prophets: Better Bounds, Streaming Algorithms, and a New Impossibility
Authors:
José Correa,
Paul Dütting,
Felix Fischer,
Kevin Schewior,
Bruno Ziliotto
Abstract:
A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an $α$ fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applicatio…
▽ More
A prophet inequality states, for some $α\in[0,1]$, that the expected value achievable by a gambler who sequentially observes random variables $X_1,\dots,X_n$ and selects one of them is at least an $α$ fraction of the maximum value in the sequence. We obtain three distinct improvements for a setting that was first studied by Correa et al. (EC, 2019) and is particularly relevant to modern applications in algorithmic pricing. In this setting, the random variables are i.i.d. from an unknown distribution and the gambler has access to an additional $βn$ samples for some $β\geq 0$. We first give improved lower bounds on $α$ for a wide range of values of $β$; specifically, $α\geq(1+β)/e$ when $β\leq 1/(e-1)$, which is tight, and $α\geq 0.648$ when $β=1$, which improves on a bound of around $0.635$ due to Correa et al. (SODA, 2020). Adding to their practical appeal, specifically in the context of algorithmic pricing, we then show that the new bounds can be obtained even in a streaming model of computation and thus in situations where the use of relevant data is complicated by the sheer amount of data available. We finally establish that the upper bound of $1/e$ for the case without samples is robust to additional information about the distribution, and applies also to sequences of i.i.d. random variables whose distribution is itself drawn, according to a known distribution, from a finite set of known candidate distributions. This implies a tight prophet inequality for exchangeable sequences of random variables, answering a question of Hill and Kertz (Contemporary Mathematics, 1992), but leaves open the possibility of better guarantees when the number of candidate distributions is small, a setting we believe is of strong interest to applications.
△ Less
Submitted 20 November, 2020; v1 submitted 12 July, 2020;
originally announced July 2020.
-
History-dependent evaluations in POMDPs
Authors:
Xavier Venel,
Bruno Ziliotto
Abstract:
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the l…
▽ More
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.
△ Less
Submitted 19 April, 2020;
originally announced April 2020.
-
An example of failure of stochastic homogenization for viscous Hamilton-Jacobi equations without convexity
Authors:
William M. Feldman,
Jean-Baptiste Fermanian,
Bruno Ziliotto
Abstract:
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
We give an example of the failure of homogenization for a viscous Hamilton-Jacobi equation with non-convex Hamiltonian.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
Authors:
Krishnendu Chatterjee,
Raimundo Saona,
Bruno Ziliotto
Abstract:
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well…
▽ More
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.
△ Less
Submitted 28 September, 2022; v1 submitted 30 April, 2019;
originally announced April 2019.
-
Constant payoff in zero-sum stochastic games
Authors:
Olivier Catoni,
Miquel Oliu-Barton,
Bruno Ziliotto
Abstract:
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a controlled random variable representing the state of nature. The total payoff is the normalized discounted sum of the stage payoffs. In this paper we solve the "constant payoff" conjecture formulated by Sorin, Vigeral and Venel (2010): if both players use optima…
▽ More
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a controlled random variable representing the state of nature. The total payoff is the normalized discounted sum of the stage payoffs. In this paper we solve the "constant payoff" conjecture formulated by Sorin, Vigeral and Venel (2010): if both players use optimal strategies, then for any alpha>0, the expected discounted payoff between stage 1 and stage alpha/lambda tends to the limit discounted value of the game, as the discount rate lambda goes to 0.
△ Less
Submitted 5 May, 2022; v1 submitted 11 November, 2018;
originally announced November 2018.
-
Prophet Secretary Through Blind Strategies
Authors:
Jose Correa,
Raimundo Saona,
Bruno Ziliotto
Abstract:
In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expect…
▽ More
In the classic prophet inequality, samples from independent random variables arrive online. A gambler that knows the distributions must decide at each point in time whether to stop and pick the current sample or to continue and lose that sample forever. The goal of the gambler is to maximize the expected value of what she picks and the performance measure is the worst case ratio between the expected value the gambler gets and what a prophet, that sees all the realizations in advance, gets. In the late seventies, Krengel and Sucheston, and Gairing (1977) established that this worst case ratio is a universal constant equal to 1/2. In the last decade prophet inequalities has resurged as an important problem due to its connections to posted price mechanisms, frequently used in online sales. A very interesting variant is the Prophet Secretary problem, in which the only difference is that the samples arrive in a uniformly random order. For this variant several algorithms achieve a constant of 1-1/e and very recently this barrier was slightly improved. This paper analyzes strategies that set a nonincreasing sequence of thresholds to be applied at different times. The gambler stops the first time a sample surpasses the corresponding threshold. Specifically we consider a class of strategies called blind quantile strategies. They consist in fixing a function which is used to define a sequence of thresholds once the instance is revealed. Our main result shows that they can achieve a constant of 0.665, improving upon the best known result of Azar et al. (2018), and on Beyhaghi et al. (2018) (order selection). Our proof analyzes precisely the underlying stop** time distribution, relying on Schur-convexity theory. We further prove that blind strategies cannot achieve better than 0.675. Finally we prove that no algorithm for the gambler can achieve better than 0.732.
△ Less
Submitted 12 March, 2019; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Convergence of the solutions of the discounted Hamilton-Jacobi equation: a counterexample
Authors:
Bruno Ziliotto
Abstract:
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
This paper provides a counterexample about the asymptotic behavior of the solutions of a discounted Hamilton-Jacobi equation, as the discount factor vanishes. The Hamiltonian of the equation is a 1-dimensional continuous and coercive Hamiltonian.
△ Less
Submitted 19 January, 2018; v1 submitted 18 January, 2018;
originally announced January 2018.
-
Tauberian theorems for general iterations of operators: applications to zero-sum stochastic games
Authors:
Bruno Ziliotto
Abstract:
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. T…
▽ More
This paper proves several Tauberian theorems for general iterations of operators, and provides two applications to zero-sum stochastic games where the total payoff is a weighted sum of the stage payoffs. The first application is to provide conditions under which the existence of the asymptotic value implies the convergence of the values of the weighted game, as players get more and more patient. The second application concerns stochastic games with finite state space and action sets. This paper builds a simple class of asymptotically optimal strategies in the weighted game, that at each stage play optimally in a discounted game with a discount factor corresponding to the relative weight of the current stage.
△ Less
Submitted 7 September, 2016;
originally announced September 2016.
-
Stochastic homogenization of nonconvex Hamilton-Jacobi equations: a counterexample
Authors:
Bruno Ziliotto
Abstract:
We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that it is not convex.
We provide an example of a Hamilton-Jacobi equation in which stochastic homogenization does not occur. The Hamiltonian involved in this example satisfies the standard assumptions of the literature, except that it is not convex.
△ Less
Submitted 7 September, 2016; v1 submitted 20 December, 2015;
originally announced December 2015.
-
Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
Authors:
Xavier Venel,
Bruno Ziliotto
Abstract:
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big en…
▽ More
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.
△ Less
Submitted 8 September, 2015; v1 submitted 27 May, 2015;
originally announced May 2015.
-
A Tauberian theorem for nonexpansive operators and applications to zero-sum stochastic games
Authors:
Bruno Ziliotto
Abstract:
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (199…
▽ More
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
△ Less
Submitted 23 February, 2015; v1 submitted 26 January, 2015;
originally announced January 2015.
-
General limit value in zero-sum stochastic games
Authors:
Bruno Ziliotto
Abstract:
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generaliz…
▽ More
Bewley and Kohlberg (1976) and Mertens and Neyman (1981) have proved, respectively, the existence of the asymptotic value and the uniform value in zero-sum stochastic games with finite state space and finite action sets. In their work, the total payoff in a stochastic game is defined either as a Cesaro mean or an Abel mean of the stage payoffs. This paper presents two findings: first, we generalize the result of Bewley and Kohlberg to a more general class of payoff evaluations and we prove with a counterexample that this result is tight. We also investigate the particular case of absorbing games. Second, for the uniform approach of Mertens and Neyman, we provide another counterexample to demonstrate that there is no natural way to generalize the result of Mertens and Neyman to a wider class of payoff evaluations.
△ Less
Submitted 11 November, 2015; v1 submitted 20 October, 2014;
originally announced October 2014.
-
Hidden Stochastic Games and Limit Equilibrium Payoffs
Authors:
Jérôme Renault,
Bruno Ziliotto
Abstract:
We consider 2-player stochastic games with perfectly observed actions, and study the limit, as the discount factor goes to one, of the equilibrium payoffs set. In the usual setup where current states are observed by the players, we show that the set of stationary equilibrium payoffs always converges, and provide a simple example where the set of equilibrium payoffs has no limit. We then introduce…
▽ More
We consider 2-player stochastic games with perfectly observed actions, and study the limit, as the discount factor goes to one, of the equilibrium payoffs set. In the usual setup where current states are observed by the players, we show that the set of stationary equilibrium payoffs always converges, and provide a simple example where the set of equilibrium payoffs has no limit. We then introduce the more general model of hidden stochastic game, where the players publicly receive imperfect signals over current states. In this setup we present an example where not only the limit set of equilibrium payoffs does not exist, but there is no converging selection of equilibrium payoffs. This second example is robust in many aspects, in particular to perturbations of the payoffs and to the introduction of correlation or communication devices.
△ Less
Submitted 10 December, 2014; v1 submitted 11 July, 2014;
originally announced July 2014.
-
Zero-sum repeated games: Counterexamples to the existence of the asymptotic value and the conjecture $\operatorname{maxmin}=\operatorname{lim}v_n$
Authors:
Bruno Ziliotto
Abstract:
Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528-1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee…
▽ More
Mertens [In Proceedings of the International Congress of Mathematicians (Berkeley, Calif., 1986) (1987) 1528-1577 Amer. Math. Soc.] proposed two general conjectures about repeated games: the first one is that, in any two-person zero-sum repeated game, the asymptotic value exists, and the second one is that, when Player 1 is more informed than Player 2, in the long run Player 1 is able to guarantee the asymptotic value. We disprove these two long-standing conjectures by providing an example of a zero-sum repeated game with public signals and perfect observation of the actions, where the value of the $λ$-discounted game does not converge when $λ$ goes to 0. The aforementioned example involves seven states, two actions and two signals for each player. Remarkably, players observe the payoffs, and play in turn.
△ Less
Submitted 15 March, 2016; v1 submitted 21 May, 2013;
originally announced May 2013.