Search | arXiv e-print repository

Equilibrium Selection for Multi-agent Reinforcement Learning: A Unified Framework

Authors: Runyu Zhang, Jeff Shamma, Na Li

Abstract: While there are numerous works in multi-agent reinforcement learning (MARL), most of them focus on designing algorithms and proving convergence to a Nash equilibrium (NE) or other equilibrium such as coarse correlated equilibrium. However, NEs can be non-unique and their performance varies drastically. Thus, it is important to design algorithms that converge to Nash equilibrium with better rewards… ▽ More While there are numerous works in multi-agent reinforcement learning (MARL), most of them focus on designing algorithms and proving convergence to a Nash equilibrium (NE) or other equilibrium such as coarse correlated equilibrium. However, NEs can be non-unique and their performance varies drastically. Thus, it is important to design algorithms that converge to Nash equilibrium with better rewards or social welfare. In contrast, classical game theory literature has extensively studied equilibrium selection for multi-agent learning in normal-form games, demonstrating that decentralized learning algorithms can asymptotically converge to potential-maximizing or Pareto-optimal NEs. These insights motivate this paper to investigate equilibrium selection in the MARL setting. We focus on the stochastic game model, leveraging classical equilibrium selection results from normal-form games to propose a unified framework for equilibrium selection in stochastic games. The proposed framework is highly modular and can extend various learning rules and their corresponding equilibrium selection results from normal-form games to the stochastic game setting. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2309.06705 [pdf, other]

Distributed Learning Dynamics for Coalitional Games

Authors: Aya Hamed, Jeff S. Shamma

Abstract: In the framework of transferable utility coalitional games, a scoring (characteristic) function determines the value of any subset/coalition of agents. Agents decide on both which coalitions to form and the allocations of the values of the formed coalitions among their members. An important concept in coalitional games is that of a core solution, which is a partitioning of agents into coalitions a… ▽ More In the framework of transferable utility coalitional games, a scoring (characteristic) function determines the value of any subset/coalition of agents. Agents decide on both which coalitions to form and the allocations of the values of the formed coalitions among their members. An important concept in coalitional games is that of a core solution, which is a partitioning of agents into coalitions and an associated allocation to each agent under which no group of agents can get a higher allocation by forming an alternative coalition. We present distributed learning dynamics for coalitional games that converge to a core solution whenever one exists. In these dynamics, an agent maintains a state consisting of (i) an aspiration level for its allocation and (ii) the coalition, if any, to which it belongs. In each stage, a randomly activated agent proposes to form a new coalition and changes its aspiration based on the success or failure of its proposal. The coalition membership structure is changed, accordingly, whenever the proposal succeeds. Required communications are that: (i) agents in the proposed new coalition need to reveal their current aspirations to the proposing agent, and (ii) agents are informed if they are joining the proposed coalition or if their existing coalition is broken. The proposing agent computes the feasibility of forming the coalition. We show that the dynamics hit an absorbing state whenever a core solution is reached. We further illustrate the distributed learning dynamics on a multi-agent task allocation setting. △ Less

Submitted 27 November, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 9 pages, 5 figures; accepted for CDC 2023

arXiv:2304.04282 [pdf, other]

Higher-Order Uncoupled Dynamics Do Not Lead to Nash Equilibrium -- Except When They Do

Authors: Sarah A. Toonsi, Jeff S. Shamma

Abstract: The framework of multi-agent learning explores the dynamics of how individual agent strategies evolve in response to the evolving strategies of other agents. Of particular interest is whether or not agent strategies converge to well known solution concepts such as Nash Equilibrium (NE). Most "fixed order" learning dynamics restrict an agent's underlying state to be its own strategy. In "higher ord… ▽ More The framework of multi-agent learning explores the dynamics of how individual agent strategies evolve in response to the evolving strategies of other agents. Of particular interest is whether or not agent strategies converge to well known solution concepts such as Nash Equilibrium (NE). Most "fixed order" learning dynamics restrict an agent's underlying state to be its own strategy. In "higher order" learning, agent dynamics can include auxiliary states that can capture phenomena such as path dependencies. We introduce higher-order gradient play dynamics that resemble projected gradient ascent with auxiliary states. The dynamics are "payoff based" in that each agent's dynamics depend on its own evolving payoff. While these payoffs depend on the strategies of other agents in a game setting, agent dynamics do not depend explicitly on the nature of the game or the strategies of other agents. In this sense, dynamics are "uncoupled" since an agent's dynamics do not depend explicitly on the utility functions of other agents. We first show that for any specific game with an isolated completely mixed-strategy NE, there exist higher-order gradient play dynamics that lead (locally) to that NE, both for the specific game and nearby games with perturbed utility functions. Conversely, we show that for any higher-order gradient play dynamics, there exists a game with a unique isolated completely mixed-strategy NE for which the dynamics do not lead to NE. These results build on prior work that showed that uncoupled fixed-order learning cannot lead to NE in certain instances, whereas higher-order variants can. Finally, we consider the mixed-strategy equilibrium associated with coordination games. While higher-order gradient play can converge to such equilibria, we show such dynamics must be inherently internally unstable. △ Less

Submitted 19 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

arXiv:2207.01346 [pdf, other]

doi 10.1109/TAC.2023.3329850

Can Competition Outperform Collaboration? The Role of Misbehaving Agents

Authors: Luca Ballotta, Giacomo Como, Jeff S. Shamma, Luca Schenato

Abstract: We investigate a novel approach to resilient distributed optimization with quadratic costs in a multi-agent system prone to unexpected events that make some agents misbehave. In contrast to commonly adopted filtering strategies, we draw inspiration from phenomena modeled through the Friedkin-Johnsen dynamics and argue that adding competition to the mix can improve resilience in the presence of mis… ▽ More We investigate a novel approach to resilient distributed optimization with quadratic costs in a multi-agent system prone to unexpected events that make some agents misbehave. In contrast to commonly adopted filtering strategies, we draw inspiration from phenomena modeled through the Friedkin-Johnsen dynamics and argue that adding competition to the mix can improve resilience in the presence of misbehaving agents. Our intuition is corroborated by analytical and numerical results showing that (i) there exists a nontrivial trade-off between full collaboration and full competition and (ii) our competition-based approach can outperform state-of-the-art algorithms based on Weighted Mean Subsequence Reduced. We also study impact of communication topology and connectivity on resilience, pointing out insights to robust network design. △ Less

Submitted 30 October, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted in IEEE TAC; 17 pages, 44 figures

MSC Class: 93D50 (Primary) 93B70 (Secondary) ACM Class: I.2.8; I.2.9

arXiv:2203.14099 [pdf, other]

doi 10.1109/CDC51059.2022.9993083

Competition-Based Resilience in Distributed Quadratic Optimization

Authors: Luca Ballotta, Giacomo Como, Jeff S. Shamma, Luca Schenato

Abstract: This paper proposes a novel approach to resilient distributed optimization with quadratic costs in a networked control system (e.g., wireless sensor network, power grid, robotic team) prone to external attacks (e.g., hacking, power outage) that cause agents to misbehave. Departing from classical filtering strategies proposed in literature, we draw inspiration from a game-theoretic formulation of t… ▽ More This paper proposes a novel approach to resilient distributed optimization with quadratic costs in a networked control system (e.g., wireless sensor network, power grid, robotic team) prone to external attacks (e.g., hacking, power outage) that cause agents to misbehave. Departing from classical filtering strategies proposed in literature, we draw inspiration from a game-theoretic formulation of the consensus problem and argue that adding competition to the mix can enhance resilience in the presence of malicious agents. Our intuition is corroborated by analytical and numerical results showing that i) our strategy highlights the presence of a nontrivial tradeoff between blind collaboration and full competition, and ii) such competition-based approach can outperform state-of-the-art algorithms based on Mean Subsequence Reduced. △ Less

Submitted 10 January, 2024; v1 submitted 26 March, 2022; originally announced March 2022.

Comments: 7 pages, 8 figures; accepted for CDC 2022

MSC Class: 93B70 (Primary) 68M18; 93D09; 93D50 (Secondary) ACM Class: I.2.11

arXiv:2111.09411 [pdf, other]

Multi-sided Matching for the Association of Space-Air-Ground Integrated Systems

Authors: Doha Hamza, Hajar El Hammouti, Jeff S Shamma, Mohamed-Slim Alouini

Abstract: Space-air-ground integrated networks (SAGINs) will play a key role in 6G communication systems. They are considered a promising technology to enhance the network capacity in highly dense agglomerations and to provide connectivity in rural areas. The multi-layer and heterogeneous nature of SAGINs necessitates an innovative design of their multi-tier associations. We propose a modeling of the SAGINs… ▽ More Space-air-ground integrated networks (SAGINs) will play a key role in 6G communication systems. They are considered a promising technology to enhance the network capacity in highly dense agglomerations and to provide connectivity in rural areas. The multi-layer and heterogeneous nature of SAGINs necessitates an innovative design of their multi-tier associations. We propose a modeling of the SAGINs association problem using multi-sided matching theory. Our aim is to provide a reliable, asynchronous and fully distributed approach that associates nodes across the layers so that the total end-to-end rate of the assigned agents is maximized. To this end, our problem is modeled as a multi-sided many-to-one matching game. A randomized matching algorithm with low information exchange is proposed. The algorithm is shown to reach an efficient and stable association between nodes in adjacent layers. Our simulation results show that the proposed approach achieves significant gain compared to the greedy and distance-based algorithms. △ Less

Submitted 19 October, 2021; originally announced November 2021.

Comments: Submitted to IEEE Communications Magazine

arXiv:2111.00411 [pdf, other]

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

Authors: Yingying Li, Subhro Das, Jeff Shamma, Na Li

Abstract: We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. The challenges of this problem arise from the tension among safety, exploration, performance, and computation. To address these challenges, we propose a polynomial-time algorithm that guarantees feasibility and constraint satisfaction with high prob… ▽ More We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions. The challenges of this problem arise from the tension among safety, exploration, performance, and computation. To address these challenges, we propose a polynomial-time algorithm that guarantees feasibility and constraint satisfaction with high probability under proper conditions. Our algorithm is implemented on a single trajectory and does not require system restarts. Further, we analyze the regret of our learning algorithm compared to the optimal safe linear controller with known model information. The proposed algorithm can achieve a $\tilde O(T^{2/3})$ regret, where $T$ is the number of stages and $\tilde O(\cdot)$ absorbs some logarithmic terms of $T$. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2012.06182 [pdf, other]

Point-to-Point Communication in Integrated Satellite-Aerial Networks: State-of-the-art and Future Challenges

Authors: Nasir Saeed, Heba Almorad, Hayssam Dahrouj, Tareq Y. Al-Naffouri, Jeff S. Shamma, Mohamed-Slim Alouini

Abstract: This paper overviews point-to-point (P2P) links for integrated satellite-aerial networks, which are envisioned to be among the key enablers of the sixth-generation (6G) of wireless networks vision. The paper first outlines the unique characteristics of such integrated large-scale complex networks, often denoted by spatial networks, and focuses on two particular space-air infrastructures, namely, s… ▽ More This paper overviews point-to-point (P2P) links for integrated satellite-aerial networks, which are envisioned to be among the key enablers of the sixth-generation (6G) of wireless networks vision. The paper first outlines the unique characteristics of such integrated large-scale complex networks, often denoted by spatial networks, and focuses on two particular space-air infrastructures, namely, satellites networks and high-altitude platforms (HAPs). The paper then classifies the connecting P2P communications links as satellite-to-satellite links at the same layer (SSLL), satellite-to-satellite links at different layers (SSLD), and HAP-to-HAP links (HHL). The paper overviews each layer of such spatial networks separately, and highlights the possible natures of the connecting links (i.e., radio-frequency or free-space optics) with a dedicated overview to the existing link-budget results. The paper, afterwards, presents the prospective merit of realizing such an integrated satellite-HAP network towards providing broadband services in under-served and remote areas. Finally, the paper sheds light on several future research directions in the context of spatial networks, namely large-scale network optimization, intelligent offloading, smart platforms, energy efficiency, multiple access schemes, and distributed spatial networks. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 17 pages

arXiv:2006.06966 [pdf, other]

doi 10.1016/B978-0-12-820276-0.00021-2

RISCuer: A Reliable Multi-UAV Search and Rescue Testbed

Authors: Mohamed Abdelkader, Usman A. Fiaz, Noureddine Toumi, Mohamed A. Mabrok, Jeff S. Shamma

Abstract: We present the Robotics Intelligent Systems & Control (RISC) Lab multiagent testbed for reliable search and rescue and aerial transport in outdoor environments. The system consists of a team of three multirotor unmanned aerial vehicles (UAVs), which are capable of autonomously searching, picking up, and transporting randomly distributed objects in an outdoor field. The method involves vision based… ▽ More We present the Robotics Intelligent Systems & Control (RISC) Lab multiagent testbed for reliable search and rescue and aerial transport in outdoor environments. The system consists of a team of three multirotor unmanned aerial vehicles (UAVs), which are capable of autonomously searching, picking up, and transporting randomly distributed objects in an outdoor field. The method involves vision based object detection and localization, passive aerial gras** with our novel design, GPS based UAV navigation, and safe release of the objects at the drop zone. Our cooperative strategy ensures safe spatial separation between UAVs at all times and we prevent any conflicts at the drop zone using communication enabled consensus. All computation is performed onboard each UAV. We describe the complete software and hardware architecture for the system and demonstrate its reliable performance using comprehensive outdoor experiments, and by comparing our results with some recent, similar works. △ Less

Submitted 7 December, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Book chapter: 41 pages, 27 figures (Minor revision: Corrected references)

Journal ref: Unmanned Aerial Systems, 2021

arXiv:1904.11184 [pdf, other]

Smart Jammer and LTE Network Strategies in An Infinite-Horizon Zero-Sum Repeated Game with Asymmetric and Incomplete Information

Authors: Farhan M. Aziz, Lichun Li, Jeff S. Shamma, Gordon L. Stuber

Abstract: LTE/LTE-Advanced networks are known to be vulnerable to denial-of-service and loss-of-service attacks from smart jammers. In this article, the interaction between a smart jammer and LTE network is modeled as an infinite-horizon, zero-sum, asymmetric repeated game. The smart jammer and eNode B are modeled as the informed and the uninformed player, respectively. The main purpose of this article is t… ▽ More LTE/LTE-Advanced networks are known to be vulnerable to denial-of-service and loss-of-service attacks from smart jammers. In this article, the interaction between a smart jammer and LTE network is modeled as an infinite-horizon, zero-sum, asymmetric repeated game. The smart jammer and eNode B are modeled as the informed and the uninformed player, respectively. The main purpose of this article is to construct efficient suboptimal strategies for both players that can be used to solve the above-mentioned infinite-horizon repeated game with asymmetric and incomplete information. It has been shown in game-theoretic literature that security strategies provide optimal solution in zero-sum games. It is also shown that both players' security strategies in an infinite-horizon asymmetric game depend only on the history of the informed player's actions. However, fixed-sized sufficient statistics are needed for both players to solve the above-mentioned game efficiently. The smart jammer uses its evolving belief state as the fixed-sized sufficient statistics for the repeated game. Whereas, the LTE network (uninformed player) uses worst-case regret of its security strategy and its anti-discounted update as the fixed-sized sufficient statistics. Although fixed-sized sufficient statistics are employed by both players, optimal security strategy computation in λ-discounted asymmetric games is still hard to perform because of non-convexity. Hence, the problem is convexified in this article by devising `approximated' security strategies for both players that are based on approximated optimal game value. However, `approximated' strategies require full monitoring. Therefore, a simplistic yet effective `expected' strategy is also constructed for the LTE network that does not require full monitoring. The simulation results show that the smart jammer plays non-revealing and misleading strategies. △ Less

Submitted 25 April, 2019; originally announced April 2019.

arXiv:1901.01497 [pdf, other]

doi 10.1016/j.ifacol.2019.11.661

usBot: A Modular Robotic Testbed for Programmable Self-Assembly

Authors: Usman A. Fiaz, Jeff S. Shamma

Abstract: We present the design, characterization, and experimental results for a new modular robotic system for programmable self-assembly. The proposed system uses the Hybrid Cube Model (HCM), which integrates classical features from both deterministic and stochastic self-organization models. Thus, for instance, the modules are passive as far as their locomotion is concerned (stochastic), and yet they pos… ▽ More We present the design, characterization, and experimental results for a new modular robotic system for programmable self-assembly. The proposed system uses the Hybrid Cube Model (HCM), which integrates classical features from both deterministic and stochastic self-organization models. Thus, for instance, the modules are passive as far as their locomotion is concerned (stochastic), and yet they possess an active undocking routine (deterministic). The robots are constructed entirely from readily accessible components and unlike many existing robots, their excitation is not fluid mediated. Instead, the actuation setup is a solid state, independently programmable and highly portable platform. The system is capable of demonstrating fully autonomous and distributed stochastic self-assembly in two dimensions. It is shown to emulate the performance of several existing modular systems and promises to be a substantial effort towards develo** a universal testbed for programmable self-assembly. △ Less

Submitted 29 June, 2019; v1 submitted 5 January, 2019; originally announced January 2019.

Comments: Accepted as a conference paper at 2019 IFAC Joint MECHATRONICS and NOLCOS

arXiv:1809.08218 [pdf, other]

Infrastructure-free Localization of Aerial Robots with Ultrawideband Sensors

Authors: Samet Guler, Mohamed Abdelkader, Jeff S. Shamma

Abstract: Robots in a swarm take advantage of a motion capture system or GPS sensors to obtain their global position. However, motion capture systems are environment-dependent and GPS sensors are not reliable in occluded environments. For a reliable and versatile operation in a swarm, robots must sense each other and interact locally. Motivated by this requirement, here we propose an on-board localization f… ▽ More Robots in a swarm take advantage of a motion capture system or GPS sensors to obtain their global position. However, motion capture systems are environment-dependent and GPS sensors are not reliable in occluded environments. For a reliable and versatile operation in a swarm, robots must sense each other and interact locally. Motivated by this requirement, here we propose an on-board localization framework for multi-robot systems. Our framework consists of an anchor robot with three ultrawideband (UWB) sensors and a tag robot with a single UWB sensor. The anchor robot utilizes the three UWB sensors as a localization infrastructure and estimates the tag robot's location by using its on-board sensing and computational capabilities solely, without explicit inter-robot communication. We utilize a dual Monte-Carlo localization approach to capture the agile maneuvers of the tag robot with an acceptable precision. We validate the effectiveness of our algorithm with simulations and indoor and outdoor experiments on a two-drone setup. The proposed dual MCL algorithm yields highly accurate estimates for various speed profiles of the tag robot and demonstrates a superior performance over the standard particle filter and the extended Kalman Filter. △ Less

Submitted 21 September, 2018; originally announced September 2018.

Comments: 14 pages

arXiv:1804.04449 [pdf, ps, other]

Herding Positive, Complex Networks

Authors: Sebastian F. Ruf, Magnus Egersted, Jeff S. Shamma

Abstract: The problem of controlling complex networks is of interest to disciplines ranging from biology to swarm robotics. However, controllability can be too strict a condition, failing to capture a range of desirable behaviors. Herdability, which describes the ability to drive a system to a specific set in the state space, was recently introduced as an alternative network control notion. This paper consi… ▽ More The problem of controlling complex networks is of interest to disciplines ranging from biology to swarm robotics. However, controllability can be too strict a condition, failing to capture a range of desirable behaviors. Herdability, which describes the ability to drive a system to a specific set in the state space, was recently introduced as an alternative network control notion. This paper considers the application of herdability to the study of complex networks under the assumption that a positive system evolves on the network. The herdability of a class of networked systems is investigated and two problems related to ensuring system herdability are explored. The first is the input addition problem, which investigates which nodes in a network should receive inputs to ensure that the system is herdable. The second is a related problem of selecting the best single node from which to herd the network, in the case that a single node is guaranteed to make the system is herdable. In order to select the best herding node, a novel control energy based herdability centrality measure is introduced. △ Less

Submitted 28 April, 2018; v1 submitted 12 April, 2018; originally announced April 2018.

Comments: Updated the proof of Theorem 2

arXiv:1804.02693 [pdf, ps, other]

Path to Stochastic Stability: Comparative Analysis of Stochastic Learning Dynamics in Games

Authors: Hassan Jaleel, Jeff S. Shamma

Abstract: Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with diff… ▽ More Stochastic stability is a popular solution concept for stochastic learning dynamics in games. However, a critical limitation of this solution concept is its inability to distinguish between different learning rules that lead to the same steady-state behavior. We address this limitation for the first time and develop a framework for the comparative analysis of stochastic learning dynamics with different update rules but same steady-state behavior. We present the framework in the context of two learning dynamics: Log-Linear Learning (LLL) and Metropolis Learning (ML). Although both of these dynamics have the same stochastically stable states, LLL and ML correspond to different behavioral models for decision making. Moreover, we demonstrate through an example setup of sensor coverage game that for each of these dynamics, the paths to stochastically stable states exhibit distinctive behaviors. Therefore, we propose multiple criteria to analyze and quantify the differences in the short and medium run behavior of stochastic learning dynamics. We derive and compare upper bounds on the expected hitting time to the set of Nash equilibria for both LLL and ML. For the medium to long-run behavior, we identify a set of tools from the theory of perturbed Markov chains that result in a hierarchical decomposition of the state space into collections of states called cycles. We compare LLL and ML based on the proposed criteria and develop invaluable insights into the comparative behavior of the two dynamics. △ Less

Submitted 8 April, 2018; originally announced April 2018.

arXiv:1711.02308 [pdf, ps, other]

Security Strategies of Both Players in Asymmetric Information Zero-Sum Stochastic Games with an Informed Controller

Authors: Lichun Li, Cedric Langbort, Jeff S. Shamma

Abstract: This paper considers a zero-sum two-player asymmetric information stochastic game where only one player knows the system state, and the transition law is controlled by the informed player only. For the informed player, it has been shown that the security strategy only depends on the belief and the current stage. We provide LP formulations whose size is only linear in the size of the uninformed pla… ▽ More This paper considers a zero-sum two-player asymmetric information stochastic game where only one player knows the system state, and the transition law is controlled by the informed player only. For the informed player, it has been shown that the security strategy only depends on the belief and the current stage. We provide LP formulations whose size is only linear in the size of the uninformed player's action set to compute both history based and belief based security strategies. For the uninformed player, we focus on the regret, the difference between 0 and the future payoff guaranteed by the uninformed player in every possible state. Regret is a real vector of the same size as the belief, and depends only on the action of the informed player and the strategy of the uninformed player. This paper shows that the uninformed player has a security strategy that only depends on the regret and the current stage. LP formulations are then given to compute the history based security strategy, the regret at every stage, and the regret based security strategy. The size of the LP formulations are again linear in the size of the uninformed player action set. Finally, an intrusion detection problem is studied to demonstrate the main results in this paper. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Comments: submitted to special issue in the journal Dynamic Games and Applications

arXiv:1703.01957 [pdf, ps, other]

An LP Approach for Solving Two-Player Zero-Sum Repeated Bayesian Games

Authors: Lichun Li, Cedric Langbort, Jeff Shamma

Abstract: This paper studies two-player zero-sum repeated Bayesian games in which every player has a private type that is unknown to the other player, and the initial probability of the type of every player is publicly known. The types of players are independently chosen according to the initial probabilities, and are kept the same all through the game. At every stage, players simultaneously choose actions,… ▽ More This paper studies two-player zero-sum repeated Bayesian games in which every player has a private type that is unknown to the other player, and the initial probability of the type of every player is publicly known. The types of players are independently chosen according to the initial probabilities, and are kept the same all through the game. At every stage, players simultaneously choose actions, and announce their actions publicly. For finite horizon cases, an explicit linear program is provided to compute players' security strategies. Moreover, based on the existing results in [1], this paper shows that a player's sufficient statistics, which is independent of the strategy of the other player, consists of the belief over the player's own type, the regret with respect to the other player's type, and the stage. Explicit linear programs are provided to compute the initial regrets, and the security strategies that only depends on the sufficient statistics. For discounted cases, following the same idea in the finite horizon, this paper shows that a player's sufficient statistics consists of the belief of the player's own type and the anti-discounted regret with respect to the other player's type. Besides, an approximated security strategy depending on the sufficient statistics is provided, and an explicit linear program to compute the approximated security strategy is given. This paper also obtains a bound on the performance difference between the approximated security strategy and the security strategy. △ Less

Submitted 7 November, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: submitted to TAC, under review

arXiv:1703.01952 [pdf, ps, other]

Efficient Strategy Computation in Zero-Sum Asymmetric Repeated Games

Authors: Lichun Li, Jeff S. Shamma

Abstract: Zero-sum asymmetric games model decision making scenarios involving two competing players who have different information about the game being played. A particular case is that of nested information, where one (informed) player has superior information over the other (uninformed) player. This paper considers the case of nested information in repeated zero-sum games and studies the computation of st… ▽ More Zero-sum asymmetric games model decision making scenarios involving two competing players who have different information about the game being played. A particular case is that of nested information, where one (informed) player has superior information over the other (uninformed) player. This paper considers the case of nested information in repeated zero-sum games and studies the computation of strategies for both the informed and uninformed players for finite-horizon and discounted infinite-horizon nested information games. For finite-horizon settings, we exploit that for both players, the security strategy, and also the opponent's corresponding best response depend only on the informed player's history of actions. Using this property, we refine the sequence form, and formulate an LP computation of player strategies that is linear in the size of the uninformed player's action set. For the infinite-horizon discounted game, we construct LP formulations to compute the approximated security strategies for both players, and provide a bound on the performance difference between the approximated security strategies and the security strategies. Finally, we illustrate the results on a network interdiction game between an informed system administrator and uniformed intruder. △ Less

Submitted 7 November, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: sumbitted to IEEE TAC, under review

arXiv:1607.02502 [pdf, other]

doi 10.1109/TCSS.2017.2719585

Networked SIS Epidemics with Awareness

Authors: Keith Paarporn, Ceyhun Eksin, Joshua S. Weitz, Jeff S. Shamma

Abstract: We study an SIS epidemic process over a static contact network where the nodes have partial information about the epidemic state. They react by limiting their interactions with their neighbors when they believe the epidemic is currently prevalent. A node's awareness is weighted by the fraction of infected neighbors in their social network, and a global broadcast of the fraction of infected nodes i… ▽ More We study an SIS epidemic process over a static contact network where the nodes have partial information about the epidemic state. They react by limiting their interactions with their neighbors when they believe the epidemic is currently prevalent. A node's awareness is weighted by the fraction of infected neighbors in their social network, and a global broadcast of the fraction of infected nodes in the entire network. The dynamics of the benchmark (no awareness) and awareness models are described by discrete-time Markov chains, from which mean-field approximations (MFA) are derived. The states of the MFA are interpreted as the nodes' probabilities of being infected. We show a sufficient condition for existence of a "metastable", or endemic, state of the awareness model coincides with that of the benchmark model. Furthermore, we use a coupling technique to give a full stochastic comparison analysis between the two chains, which serves as a probabilistic analogue to the MFA analysis. In particular, we show that adding awareness reduces the expectation of any epidemic metric on the space of sample paths, e.g. eradication time or total infections. We characterize the reduction in expectations in terms of the coupling distribution. In simulations, we evaluate the effect social distancing has on contact networks from different random graph families (geometric, Erdős-Renyi, and scale-free random networks). △ Less

Submitted 12 July, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: 10 pages, 5 figures

arXiv:1605.00306 [pdf, other]

BLMA: A Blind Matching Algorithm with Application to Cognitive Radio Networks

Authors: Doha Hamza, Jeff S. Shamma

Abstract: We consider a two-sided matching problem with a defined notion of pairwise stability. We propose a distributed blind matching algorithm (BLMA) to solve the problem. We prove the solution produced by BLMA will converge to an $ε$-pairwise stable outcome with probability one. We then consider a matching problem in cognitive radio networks. Secondary users (SUs) are allowed access time to the spectrum… ▽ More We consider a two-sided matching problem with a defined notion of pairwise stability. We propose a distributed blind matching algorithm (BLMA) to solve the problem. We prove the solution produced by BLMA will converge to an $ε$-pairwise stable outcome with probability one. We then consider a matching problem in cognitive radio networks. Secondary users (SUs) are allowed access time to the spectrum belonging to the primary users (PUs) provided that they relay primary messages. We propose a realization of the BLMA to produce an $ε$-pairwise stable solution assuming quasi-convex and quasi-concave utilities. In the case of more general utility forms, we show another BLMA realization to provide a stable solution. Furthermore, we propose negotiation mechanism to bias the algorithm towards one side of the market. We use this mechanism to protect the exclusive rights of the PUs to the spectrum. In all such implementations of the BLMA, we impose a limited information exchange in the network so that agents can only calculate their own utilities, but no information is available about the utilities of any other users in the network. △ Less

Submitted 1 May, 2016; originally announced May 2016.

arXiv:1604.03240 [pdf, other]

Disease dynamics on a network game: a little empathy goes a long way

Authors: Ceyhun Eksin, Jeff S. Shamma, Joshua S. Weitz

Abstract: Individuals change their behavior during an epidemic in response to whether they and/or those they interact with are healthy or sick. Healthy individuals are concerned about contracting a disease from their sick contacts and may utilize protective measures. Sick individuals may be concerned with spreading the disease to their healthy contacts and adopt preemptive measures. Yet, in practice both pr… ▽ More Individuals change their behavior during an epidemic in response to whether they and/or those they interact with are healthy or sick. Healthy individuals are concerned about contracting a disease from their sick contacts and may utilize protective measures. Sick individuals may be concerned with spreading the disease to their healthy contacts and adopt preemptive measures. Yet, in practice both protective and preemptive changes in behavior come with costs. This paper proposes a stochastic network disease game model that captures the self-interests of individuals during the spread of a susceptible-infected-susceptible (SIS) disease where individuals react to current risk of disease spread, and their reactions together with the current state of the disease stochastically determine the next stage of the disease. We show that there is a critical level of concern, i.e., empathy, by the sick individuals above which disease is eradicated fast. Furthermore, we find that if the network and disease parameters are above the epidemic threshold, the risk averse behavior by the healthy individuals cannot eradicate the disease without the preemptive measures of the sick individuals. This imbalance in the role played by the response of the infected versus the susceptible individuals in disease eradication affords critical policy insights. △ Less

Submitted 15 April, 2016; v1 submitted 12 April, 2016; originally announced April 2016.

Comments: 27 pages, 9 figures, submitted for publication

arXiv:1512.02160 [pdf, other]

Learning Efficient Correlated Equilibria

Authors: Holly P. Borowski, Jason R. Marden, Jeff S. Shamma

Abstract: The majority of distributed learning literature focuses on convergence to Nash equilibria. Correlated equilibria, on the other hand, can often characterize more efficient collective behavior than even the best Nash equilibrium. However, there are no existing distributed learning algorithms that converge to specific correlated equilibria. In this paper, we provide one such algorithm which guarantee… ▽ More The majority of distributed learning literature focuses on convergence to Nash equilibria. Correlated equilibria, on the other hand, can often characterize more efficient collective behavior than even the best Nash equilibrium. However, there are no existing distributed learning algorithms that converge to specific correlated equilibria. In this paper, we provide one such algorithm which guarantees that the agents' collective joint strategy will constitute an efficient correlated equilibrium with high probability. The key to attaining efficient correlated behavior through distributed learning involves incorporating a common random signal into the learning environment. △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: 11 pages, 1 figure

arXiv:1510.08204 [pdf, other]

Global Games with Noisy Information Sharing

Authors: Hessam Mahdavifar, Ahmad Beirami, Behrouz Touri, Jeff S. Shamma

Abstract: Global games form a subclass of games with incomplete information where a set of agents decide actions against a regime with an underlying fundamental $θ$ representing its power. Each agent has access to an independent noisy observation of $θ$. In order to capture the behavior of agents in a social network of information exchange we assume that agents share their observation in a noisy environment… ▽ More Global games form a subclass of games with incomplete information where a set of agents decide actions against a regime with an underlying fundamental $θ$ representing its power. Each agent has access to an independent noisy observation of $θ$. In order to capture the behavior of agents in a social network of information exchange we assume that agents share their observation in a noisy environment prior to making their decision. We show that global games with noisy sharing of information do not admit an intuitive type of threshold policy which only depends on agents' belief about the underlying $θ$. This is in contrast to the existing results on the threshold policy for the conventional set-up of global games. Motivated by this result, we investigate the existence of equilibrium strategies in a more general collection of threshold-type policies and show that such equilibrium strategies exist and are unique if the sharing of information happens over a sufficiently noisy environment. △ Less

Submitted 28 October, 2017; v1 submitted 28 October, 2015; originally announced October 2015.

Comments: Accepted to IEEE Transactions on Signal and Information Processing over Networks

arXiv:1509.00737 [pdf, other]

A Game-theoretic Formulation of the Homogeneous Self-Reconfiguration Problem

Authors: Daniel Pickem, Magnus Egerstedt, Jeff S. Shamma

Abstract: In this paper we formulate the homogeneous two- and three-dimensional self-reconfiguration problem over discrete grids as a constrained potential game. We develop a game-theoretic learning algorithm based on the Metropolis-Hastings algorithm that solves the self-reconfiguration problem in a globally optimal fashion. Both a centralized and a fully distributed algorithm are presented and we show tha… ▽ More In this paper we formulate the homogeneous two- and three-dimensional self-reconfiguration problem over discrete grids as a constrained potential game. We develop a game-theoretic learning algorithm based on the Metropolis-Hastings algorithm that solves the self-reconfiguration problem in a globally optimal fashion. Both a centralized and a fully distributed algorithm are presented and we show that the only stochastically stable state is the potential function maximizer, i.e. the desired target configuration. These algorithms compute transition probabilities in such a way that even though each agent acts in a self-interested way, the overall collective goal of self-reconfiguration is achieved. Simulation results confirm the feasibility of our approach and show convergence to desired target configurations. △ Less

Submitted 2 September, 2015; originally announced September 2015.

Comments: 8 pages, 5 figures, 2 algorithms

arXiv:1505.06379 [pdf, other]

doi 10.1109/TCNS.2016.2518083

Communication-Free Distributed Coverage for Networked Systems

Authors: A. Yasin Yazicioglu, Magnus Egerstedt, Jeff S. Shamma

Abstract: In this paper, we present a communication-free algorithm for distributed coverage of an arbitrary network by a group of mobile agents with local sensing capabilities. The network is represented as a graph, and the agents are arbitrarily deployed on some nodes of the graph. Any node of the graph is covered if it is within the sensing range of at least one agent. The agents are mobile devices that a… ▽ More In this paper, we present a communication-free algorithm for distributed coverage of an arbitrary network by a group of mobile agents with local sensing capabilities. The network is represented as a graph, and the agents are arbitrarily deployed on some nodes of the graph. Any node of the graph is covered if it is within the sensing range of at least one agent. The agents are mobile devices that aim to explore the graph and to optimize their locations in a decentralized fashion by relying only on their sensory inputs. We formulate this problem in a game theoretic setting and propose a communication-free learning algorithm for maximizing the coverage. △ Less

Submitted 23 May, 2015; originally announced May 2015.

arXiv:1503.08131 [pdf, other]

doi 10.1109/TNSE.2015.2503983

Formation of Robust Multi-Agent Networks Through Self-Organizing Random Regular Graphs

Authors: A. Yasin Yazicioglu, Magnus Egerstedt, Jeff S. Shamma

Abstract: Multi-agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small… ▽ More Multi-agent networks are often modeled as interaction graphs, where the nodes represent the agents and the edges denote some direct interactions. The robustness of a multi-agent network to perturbations such as failures, noise, or malicious attacks largely depends on the corresponding graph. In many applications, networks are desired to have well-connected interaction graphs with relatively small number of links. One family of such graphs is the random regular graphs. In this paper, we present a decentralized scheme for transforming any connected interaction graph with a possibly non-integer average degree of k into a connected random m-regular graph for some m in [k, k + 2]. Accordingly, the agents improve the robustness of the network with a minimal change in the overall sparsity by optimizing the graph connectivity through the proposed local operations. △ Less

Submitted 27 March, 2015; originally announced March 2015.

arXiv:1110.4412 [pdf, ps, other]

doi 10.1137/110852462

Aspiration Learning in Coordination Games

Authors: Georgios C. Chasparis, Ari Arapostathis, Jeff S. Shamma

Abstract: We consider the problem of distributed convergence to efficient outcomes in coordination games through dynamics based on aspiration learning. Under aspiration learning, a player continues to play an action as long as the rewards received exceed a specified aspiration level. Here, the aspiration level is a fading memory average of past rewards, and these levels also are subject to occasional random… ▽ More We consider the problem of distributed convergence to efficient outcomes in coordination games through dynamics based on aspiration learning. Under aspiration learning, a player continues to play an action as long as the rewards received exceed a specified aspiration level. Here, the aspiration level is a fading memory average of past rewards, and these levels also are subject to occasional random perturbations. A player becomes dissatisfied whenever a received reward is less than the aspiration level, in which case the player experiments with a probability proportional to the degree of dissatisfaction. Our first contribution is the characterization of the asymptotic behavior of the induced Markov chain of the iterated process in terms of an equivalent finite-state Markov chain. We then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of coordination games, examples of which include network formation and common-pool games. In particular, we show that in generic coordination games the frequency at which an efficient action profile is played can be made arbitrarily large. Although convergence to efficient outcomes is desirable, in several coordination games, such as common-pool games, attainability of fair outcomes, i.e., sequences of plays at which players experience highly rewarding returns with the same frequency, might also be of special interest. To this end, we demonstrate through analysis and simulations that aspiration learning also establishes fair outcomes in all symmetric coordination games, including common-pool games. △ Less

Submitted 19 October, 2011; originally announced October 2011.

Comments: 27 pages

MSC Class: 68T05; 91A26; 91A22; 93E35; 60J05; 91A80

Journal ref: SIAM J. Control Optim. 51 (2013), no. 1, 465-490

Showing 1–26 of 26 results for author: Shamma, J