-
Stochastic homogenization of HJ equations: a differential game approach
Authors:
Andrea Davini,
Raimundo Saona,
Bruno Ziliotto
Abstract:
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
We prove stochastic homogenization for a class of non-convex and non-coercive first-order Hamilton-Jacobi equations in a finite-range of dependence environment for Hamiltonians that can be expressed by a max-min formula. We make use of the representation of the solution as a value function of a differential game to implement a game-theoretic approach to the homogenization problem.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Ergodic Unobservable MDPs: Decidability of Approximation
Authors:
Krishnendu Chatterjee,
David Lurie,
Raimundo Saona,
Bruno Ziliotto
Abstract:
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on…
▽ More
Unobservable Markov decision processes (UMDPs) serve as a prominent mathematical framework for modeling sequential decision-making problems. A key aspect in computational analysis is the consideration of decidability, which concerns the existence of algorithms. In general, the computation of the exact and approximated values is undecidable for UMDPs with the long-run average objective. Building on matrix product theory and ergodic properties, we introduce a novel subclass of UMDPs, termed ergodic UMDPs. Our main result demonstrates that approximating the value within this subclass is decidable. However, we show that the exact problem remains undecidable. Finally, we discuss the primary challenges of extending these results to partially observable Markov decision processes.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Zero-sum Random Games on Directed Graphs
Authors:
Luc Attia,
Lyuben Lichev,
Dieter Mitsche,
Raimundo Saona,
Bruno Ziliotto
Abstract:
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players.
Starting from a fixed vertex, players take turns to move a token along the edges of the graph.
On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game conve…
▽ More
This paper considers a class of two-player zero-sum games on directed graphs whose vertices are equipped with random payoffs of bounded support known by both players.
Starting from a fixed vertex, players take turns to move a token along the edges of the graph.
On the one hand, for acyclic directed graphs of bounded degree and sub-exponential expansion, we show that the value of the game converges almost surely to a constant at an exponential rate dominated in terms of the expansion.
On the other hand, for the infinite $d$-ary tree that does not fall into the previous class of graphs, we show convergence at a double-exponential rate in terms of the expansion.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Prophet Inequalities: Separating Random Order from Order Selection
Authors:
Giordano Giambartolomei,
Frederik Mallmann-Trenn,
Raimundo Saona
Abstract:
Prophet inequalities are a central object of study in optimal stop** theory. A gambler is sent values in an online fashion, sampled from an instance of independent distributions, in an adversarial, random or selected order, depending on the model. When observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the…
▽ More
Prophet inequalities are a central object of study in optimal stop** theory. A gambler is sent values in an online fashion, sampled from an instance of independent distributions, in an adversarial, random or selected order, depending on the model. When observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is maximising the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations.
The model, in which the gambler selects the arrival order first, and then observes the values, is known as Order Selection. In this model a ratio of $0.7251$ is attainable for any instance. Recently, this has been improved up to $0.7258$ by Bubna and Chiplunkar (2023). If the gambler chooses the arrival order (uniformly) at random, we obtain the Random Order model. The worst case ratio over all possible instances has been extensively studied for at least $40$ years. Through simulations, Bubna and Chiplunkar (2023) also showed that this ratio is at most $0.7254$ for the Random Order model, thus establishing for the first time that carefully choosing the order, instead of simply taking it at random, benefits the gambler. We give an alternative, non-simulation-assisted proof of this fact, by showing mathematically that in the Random Order model, no algorithm can achieve a ratio larger than $0.7235$. This sets a new state-of-the-art hardness for this model, and establishes more formally that there is a real benefit in choosing the order.
△ Less
Submitted 28 June, 2024; v1 submitted 8 April, 2023;
originally announced April 2023.
-
Repeated Prophet Inequality with Near-optimal Bounds
Authors:
Krishnendu Chatterjee,
Mona Mohammadi,
Raimundo Saona
Abstract:
In modern sample-driven Prophet Inequality, an adversary chooses a sequence of $n$ items with values $v_1, v_2, \ldots, v_n$ to be presented to a decision maker (DM). The process follows in two phases. In the first phase (sampling phase), some items, possibly selected at random, are revealed to the DM, but she can never accept them. In the second phase, the DM is presented with the other items in…
▽ More
In modern sample-driven Prophet Inequality, an adversary chooses a sequence of $n$ items with values $v_1, v_2, \ldots, v_n$ to be presented to a decision maker (DM). The process follows in two phases. In the first phase (sampling phase), some items, possibly selected at random, are revealed to the DM, but she can never accept them. In the second phase, the DM is presented with the other items in a random order and online fashion. For each item, she must make an irrevocable decision to either accept the item and stop the process or reject the item forever and proceed to the next item. The goal of the DM is to maximize the expected value as compared to a Prophet (or offline algorithm) that has access to all information. In this setting, the sampling phase has no cost and is not part of the optimization process. However, in many scenarios, the samples are obtained as part of the decision-making process.
We model this aspect as a two-phase Prophet Inequality where an adversary chooses a sequence of $2n$ items with values $v_1, v_2, \ldots, v_{2n}$ and the items are randomly ordered. Finally, there are two phases of the Prophet Inequality problem with the first $n$-items and the rest of the items, respectively. We show that some basic algorithms achieve a ratio of at most $0.450$. We present an algorithm that achieves a ratio of at least $0.495$. Finally, we show that for every algorithm the ratio it can achieve is at most $0.502$. Hence our algorithm is near-optimal.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
Authors:
Krishnendu Chatterjee,
Raimundo Saona,
Bruno Ziliotto
Abstract:
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well…
▽ More
Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that in POMDPs with long-run average objective, the decision maker has approximately optimal strategies with finite memory. This implies notably that approximating the long-run value is recursively enumerable, as well as a weak continuity property of the value with respect to the transition function.
△ Less
Submitted 28 September, 2022; v1 submitted 30 April, 2019;
originally announced April 2019.