Search | arXiv e-print repository

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Authors: Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer

Abstract: We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer se… ▽ More We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $Σ_2^P$ , $\exists$R, and $\exists \forall$R. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Long version of the paper that got accepted to the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024). 35 pages, 10 figures, 1 table

MSC Class: 91A05; 91A06; 91A10; 91A11; 91A18; 91A35; 91A68; 68T37; 68Q17; 68Q25 ACM Class: I.2; J.4; F.2

arXiv:2402.08128 [pdf, other]

Recursive Joint Simulation in Games

Authors: Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer

Abstract: Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction… ▽ More Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction between AI agents where the agents run a recursive joint simulation. That is, the agents first jointly observe a simulation of the situation they face. This simulation in turn recursively includes additional simulations (with a small chance of failure, to avoid infinite recursion), and the results of all these nested simulations are observed before an action is chosen. We show that the resulting interaction is strategically equivalent to an infinitely repeated version of the original game, allowing a direct transfer of existing results such as the various folk theorems. △ Less

Submitted 1 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06626 [pdf, other]

Computing Optimal Commitments to Strategies and Outcome-Conditional Utility Transfers

Authors: Nathaniel Sauerberg, Caspar Oesterheld

Abstract: Prior work has studied the computational complexity of computing optimal strategies to commit to in Stackelberg or leadership games, where a leader commits to a strategy which is observed by one or more followers. We extend this setting to one where the leader can additionally commit to outcome-conditional utility transfers. We characterize the computational complexity of finding optimal strategie… ▽ More Prior work has studied the computational complexity of computing optimal strategies to commit to in Stackelberg or leadership games, where a leader commits to a strategy which is observed by one or more followers. We extend this setting to one where the leader can additionally commit to outcome-conditional utility transfers. We characterize the computational complexity of finding optimal strategies in normal-form and Bayesian games, giving a mix of efficient algorithms and NP-hardness results. Finally, we allow the leader to also commit to a signaling scheme which induces a correlated equilibrium. In this setting, optimal commitments can be found in polynomial time for arbitrarily many players. △ Less

Submitted 10 March, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: AAMAS 2024

arXiv:2307.05068 [pdf, ps, other]

doi 10.4204/EPTCS.379.33

A Theory of Bounded Inductive Rationality

Authors: Caspar Oesterheld, Abram Demski, Vincent Conitzer

Abstract: The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning pr… ▽ More The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: In Proceedings TARK 2023, arXiv:2307.04005

ACM Class: I.2

Journal ref: EPTCS 379, 2023, pp. 421-440

arXiv:2305.17805 [pdf, other]

The Computational Complexity of Single-Player Imperfect-Recall Games

Authors: Emanuel Tewolde, Caspar Oesterheld, Vincent Conitzer, Paul W. Goldberg

Abstract: We study single-player extensive-form games with imperfect recall, such as the Slee** Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses… ▽ More We study single-player extensive-form games with imperfect recall, such as the Slee** Beauty problem or the Absentminded Driver game. For such games, two natural equilibrium concepts have been proposed as alternative solution concepts to ex-ante optimality. One equilibrium concept uses generalized double halving (GDH) as a belief system and evidential decision theory (EDT), and another one uses generalized thirding (GT) as a belief system and causal decision theory (CDT). Our findings relate those three solution concepts of a game to solution concepts of a polynomial maximization problem: global optima, optimal points with respect to subsets of variables and Karush-Kuhn-Tucker (KKT) points. Based on these correspondences, we are able to settle various complexity-theoretic questions on the computation of such strategies. For ex-ante optimality and (EDT,GDH)-equilibria, we obtain NP-hardness and inapproximability, and for (CDT,GT)-equilibria we obtain CLS-completeness results. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: Long version of the paper that got accepted to the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23). 10 pages and 2 figures in the main body. 17 pages and 4 figures in the appendix

MSC Class: 91A18; 68T37; 68Q17; 91A35 ACM Class: I.2; J.4; F.2

arXiv:2305.17601 [pdf, other]

Incentivizing honest performative predictions with proper scoring rules

Authors: Caspar Oesterheld, Johannes Treutlein, Emery Cooper, Rubi Hudson

Abstract: Proper scoring rules incentivize experts to accurately report beliefs, assuming predictions cannot influence outcomes. We relax this assumption and investigate incentives when predictions are performative, i.e., when they can influence the outcome of the prediction, such as when making public predictions about the stock market. We say a prediction is a fixed point if it accurately reflects the exp… ▽ More Proper scoring rules incentivize experts to accurately report beliefs, assuming predictions cannot influence outcomes. We relax this assumption and investigate incentives when predictions are performative, i.e., when they can influence the outcome of the prediction, such as when making public predictions about the stock market. We say a prediction is a fixed point if it accurately reflects the expert's beliefs after that prediction has been made. We show that in this setting, reports maximizing expected score generally do not reflect an expert's beliefs, and we give bounds on the inaccuracy of such reports. We show that, for binary predictions, if the influence of the expert's prediction on outcomes is bounded, it is possible to define scoring rules under which optimal reports are arbitrarily close to fixed points. However, this is impossible for predictions over more than two outcomes. We also perform numerical simulations in a toy setting, showing that our bounds are tight in some situations and that prediction error is often substantial (greater than 5-10%). Lastly, we discuss alternative notions of optimality, including performative stability, and show that they incentivize reporting fixed points. △ Less

Submitted 30 May, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)

arXiv:2305.11261 [pdf, other]

Game Theory with Simulation of Other Players

Authors: Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer

Abstract: Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent's actions. This could lower the bar for trust and cooperation. In this paper, we formalize games in which one play… ▽ More Game-theoretic interactions with AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to simulate an AI agent (for example because its source code is known), which allows others to accurately predict the agent's actions. This could lower the bar for trust and cooperation. In this paper, we formalize games in which one player can simulate another at a cost. We first derive some basic properties of such games and then prove a number of results for them, including: (1) introducing simulation into generic-payoff normal-form games makes them easier to solve; (2) if the only obstacle to cooperation is a lack of trust in the possibly-simulated agent, simulation enables equilibria that improve the outcome for both agents; and however (3) there are settings where introducing simulation results in strictly worse outcomes for both players. △ Less

Submitted 19 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: The latest version fixes some typos in the proof of Theorem 5

arXiv:2211.14468 [pdf, other]

Similarity-based cooperative equilibrium

Authors: Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster

Abstract: As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutu… ▽ More As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutually transparent to each other, i.e., to allow them to access one another's source code (Rubinstein 1998, Tennenholtz 2004) -- or weights in the case of ML agents. However, full transparency is often unrealistic, whereas partial transparency is commonplace. Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting. In this paper, we introduce a more realistic setting in which agents only observe a single number indicating how similar they are to each other. We prove that this allows for the same set of cooperative outcomes as the full transparency setting. We also demonstrate experimentally that cooperation can be learned using simple ML methods. △ Less

Submitted 12 November, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: Published at NeurIPS 2023. 32 pages, 9 figures

MSC Class: 91A10 (Primary) 91A05 91A26 91A35 (Secondary) ACM Class: I.2.11

arXiv:2211.05057 [pdf, ps, other]

A Note on the Compatibility of Different Robust Program Equilibria of the Prisoner's Dilemma

Authors: Caspar Oesterheld

Abstract: We study a program game version of the Prisoner's Dilemma, i.e., a two-player game in which each player submits a computer program, the programs are given read access to each other's source code and then choose whether to cooperate or defect. Prior work has introduced various programs that form cooperative equilibria against themselves in this game. For example, the $ε$-grounded Fair Bot cooperate… ▽ More We study a program game version of the Prisoner's Dilemma, i.e., a two-player game in which each player submits a computer program, the programs are given read access to each other's source code and then choose whether to cooperate or defect. Prior work has introduced various programs that form cooperative equilibria against themselves in this game. For example, the $ε$-grounded Fair Bot cooperates with probability $ε$ and with the remaining probability runs its opponent's program and copies its action. If both players submit this program, then this is a Nash equilibrium in which both players cooperate. Others have proposed cooperative equilibria based on proof-based Fair Bots, which cooperate if they can prove that the opponent cooperates (and defect otherwise). We here show that these different programs are compatible with each other. For example, if one player submits $ε$-grounded Fair Bot and the other submits a proof-based Fair Bot, then this is also a cooperative equilibrium of the program game version of the Prisoner's Dilemma. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: 8 pages, 1 table

MSC Class: 91A44

arXiv:2207.03470 [pdf, other]

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Authors: Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

Abstract: Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the comm… ▽ More Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2106.06613 [pdf, other]

A New Formalism, Method and Open Issues for Zero-Shot Coordination

Authors: Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

Abstract: In many coordination problems, independently reasoning humans are able to discover mutually compatible policies. In contrast, independently trained self-play policies are often mutually incompatible. Zero-shot coordination (ZSC) has recently been proposed as a new frontier in multi-agent reinforcement learning to address this fundamental issue. Prior work approaches the ZSC problem by assuming pla… ▽ More In many coordination problems, independently reasoning humans are able to discover mutually compatible policies. In contrast, independently trained self-play policies are often mutually incompatible. Zero-shot coordination (ZSC) has recently been proposed as a new frontier in multi-agent reinforcement learning to address this fundamental issue. Prior work approaches the ZSC problem by assuming players can agree on a shared learning algorithm but not on labels for actions and observations, and proposes other-play as an optimal solution. However, until now, this "label-free" problem has only been informally defined. We formalize this setting as the label-free coordination (LFC) problem by defining the label-free coordination game. We show that other-play is not an optimal solution to the LFC problem as it fails to consistently break ties between incompatible maximizers of the other-play objective. We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game. Since arbitrary tie-breaking is precisely what the ZSC setting aims to prevent, we conclude that the LFC problem does not reflect the aims of ZSC. To address this, we introduce an alternative informal operationalization of ZSC as a starting point for future work. △ Less

Submitted 12 July, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

arXiv:1504.05603 [pdf, other]

doi 10.1007/s11229-015-0883-1

Formalizing Preference Utilitarianism in Physical World Models

Authors: Caspar Oesterheld

Abstract: Most ethical work is done at a low level of formality. This makes practical moral questions inaccessible to formal and natural sciences and can lead to misunderstandings in ethical discussion. In this paper, we use Bayesian inference to introduce a formalization of preference utilitarianism in physical world models, specifically cellular automata. Even though our formalization is not immediately a… ▽ More Most ethical work is done at a low level of formality. This makes practical moral questions inaccessible to formal and natural sciences and can lead to misunderstandings in ethical discussion. In this paper, we use Bayesian inference to introduce a formalization of preference utilitarianism in physical world models, specifically cellular automata. Even though our formalization is not immediately applicable, it is a first step in providing ethics and ultimately the question of how to "make the world better" with a formal basis. △ Less

Submitted 30 November, 2015; v1 submitted 21 April, 2015; originally announced April 2015.

Comments: 14 pages, 3 figures

Showing 1–12 of 12 results for author: Oesterheld, C