Search | arXiv e-print repository

Blending Data-Driven Priors in Dynamic Games

Authors: Justin Lidard, Haimin Hu, Asher Hancock, Zixu Zhang, Albert Gimó Contreras, Vikash Modi, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, María Santos, Jaime Fernández Fisac

Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, h… ▽ More As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/. △ Less

Submitted 6 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, 12 figures

arXiv:2312.06395 [pdf, other]

Threshold Decision-Making Dynamics Adaptive to Physical Constraints and Changing Environment

Authors: Giovanna Amorim, María Santos, Shinkyu Park, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by me… ▽ More We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by means of the bifurcation behavior how the coupled dynamics are adaptive to the physical constraints of the agent. We also show how the bifurcation can be modulated to allow the agent to switch tasks based on thresholds adaptive to environmental conditions. We illustrate the benefits of the approach through a decentralized multi-robot task allocation application for trash collection. △ Less

Submitted 7 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.02204 [pdf, other]

Active risk aversion in SIS epidemics on networks

Authors: Anastasia Bizyaeva, Marcela Ordorica Arango, Yunxiu Zhou, Simon Levin, Naomi Ehrich Leonard

Abstract: We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two dist… ▽ More We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two distinct networks. One is a physical contact network that defines which populations come into contact with which other populations and thus how infection spreads. The other is a communication network, such as an online social network, that defines which populations observe the infection level of which other populations and thus how information spreads. We prove that the model, with these two networks and populations using risk aversion strategies, exhibits a transcritical bifurcation in which an endemic equilibrium emerges. For regular graphs, we prove that the endemic infection level is uniform across populations and reduced by the risk aversion strategy, relative to the network SIS endemic level. We show that when communication is sufficiently sparse, this initially stable equilibrium loses stability in a secondary bifurcation. Simulations show that a new stable solution emerges with nonuniform infection levels. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2308.02755 [pdf, other]

Multi-topic belief formation through bifurcations over signed social networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that agents become opinionated as a consequence of a bifurcation. W… ▽ More We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that agents become opinionated as a consequence of a bifurcation. We analyze how the balance of social network effects in the model controls the nature of the bifurcation and, therefore, the belief-forming limit-set solutions. Our analysis provides constructive conditions on how multi-stable network belief equilibria and belief oscillations emerging at a belief-forming bifurcation depend on the communication network graph and belief system network graph. Our model and analysis provide new theoretical insights on the dynamics of social systems and a new principled framework for designing decentralized decision-making on engineered networks in the presence of structured relationships among alternatives. △ Less

Submitted 2 July, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 16 pages, 7 figures

arXiv:2305.17600 [pdf, other]

NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction

Authors: Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

Abstract: Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-… ▽ More Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model. △ Less

Submitted 11 November, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures

arXiv:2210.00353 [pdf, other]

Sustained oscillations in multi-topic belief dynamics over signed networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude… ▽ More We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude patterns of the oscillations. Additionally, we distinguish broad classes of graphs that exhibit such oscillations from those that do not. △ Less

Submitted 22 March, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 6 pages, 6 figures, accepted for publication in the 2023 American Control Conference proceedings

arXiv:2206.14893 [pdf, other]

Breaking indecision in multi-agent, multi-option dynamics

Authors: Alessio Franci, Martin Golubitsky, Ian Stewart, Anastasia Bizyaeva, Naomi Ehrich Leonard

Abstract: How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is… ▽ More How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is also important because in a fully indecisive state agents are not biased toward any specific option and therefore the agent group is maximally sensitive and prone to adapt to inputs and changes in its environment. Here, we develop a mathematical theory to study how decisions arise from the breaking of indecision. Our approach is grounded in both equivariant and network bifurcation theory. We model decision from indecision as synchrony-breaking in influence networks in which each node is the value assigned by an agent to an option. First, we show that three universal decision behaviors, namely, deadlock, consensus, and dissensus, are the generic outcomes of synchrony-breaking bifurcations from a fully synchronous state of indecision in influence networks. Second, we show that all deadlock and consensus value patterns and some dissensus value patterns are predicted by the symmetry of the influence networks. Third, we show that there are also many `exotic' dissensus value patterns. These patterns are predicted by network architecture, but not by network symmetries, through a new synchrony-breaking branching lemma. This is the first example of exotic solutions in an application. Numerical simulations of a novel influence network model illustrate our theoretical results. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 36 pages

arXiv:2203.11703 [pdf, other]

doi 10.1109/LCSYS.2022.3185981

Switching transformations for decentralized control of opinion patterns in signed networks: application to dynamic task allocation

Authors: Anastasia Bizyaeva, Giovanna Amorim, Maria Santos, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a new decentralized design method to control opinion patterns on signed networks of agents making decisions about two options and to switch the network from any opinion pattern to a new desired one. Our method relies on switching transformations, which switch the sign of an agent's opinion at a stable equilibrium by flip** the sign of its local interactions with its neighbors. The glo… ▽ More We propose a new decentralized design method to control opinion patterns on signed networks of agents making decisions about two options and to switch the network from any opinion pattern to a new desired one. Our method relies on switching transformations, which switch the sign of an agent's opinion at a stable equilibrium by flip** the sign of its local interactions with its neighbors. The global dynamical behavior of the switched network can be predicted rigorously when the original, and thus the witched, networks are structurally balanced. Structural balance ensures that the network dynamics are monotone, which makes the study of the basin of attraction of the various opinion patterns amenable to rigorous analysis through monotone systems theory. We illustrate the utility of the approach through scenarios motivated by multi-robot coordination and dynamic task allocation. △ Less

Submitted 31 May, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2201.13288 [pdf, other]

A Regret Minimization Approach to Multi-Agent Control

Authors: Udaya Ghai, Udari Madhushani, Naomi Leonard, Elad Hazan

Abstract: We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distri… ▽ More We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics. △ Less

Submitted 25 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7422-7434, 2022

arXiv:2110.07392 [pdf, other]

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Authors: Justin Lidard, Udari Madhushani, Naomi Ehrich Leonard

Abstract: A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transiti… ▽ More A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transition dynamics, in which agents can communicate in a decentralized manner.We show that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its $γ$-hop neighbors. We prove regret and sample complexity bounds that depend on the number of agents, communication network structure and $γ.$ We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy. Numerical simulations illustrate our results and validate our theoretical claims. △ Less

Submitted 2 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper to American Control Conference (ACC) 2022

arXiv:2108.00966 [pdf, other]

Tuning Cooperative Behavior in Games with Nonlinear Opinion Dynamics

Authors: Shinkyu Park, Anastasia Bizyaeva, Mari Kawakatsu, Alessio Franci, Naomi Ehrich Leonard

Abstract: We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate… ▽ More We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate behavior of agents that select strategies using rationality and reciprocity, key features of human decision-making in social dilemmas. For two-strategy games, we use bifurcation analysis to prove conditions for the bistability of two equilibria and conditions for the first (second) equilibrium to reflect all agents favoring the first (second) strategy. We prove how model parameters, e.g., level of attention to opinions of others (reciprocity), network structure, and payoffs, influence dynamics and, notably, the size of the region of attraction to each stable equilibrium. We provide insights by examining the tuning of the bistability of mutual cooperation and mutual defection and their regions of attraction for the repeated prisoner's dilemma and the repeated multi-agent public goods game. Our results generalize to games with more strategies, heterogeneity, and additional feedback dynamics, such as those designed to elicit cooperation. △ Less

Submitted 23 November, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

arXiv:2103.14764 [pdf, ps, other]

doi 10.1109/CDC45484.2021.9683650

Control of Agreement and Disagreement Cascades with Distributed Inputs

Authors: Anastasia Bizyaeva, Timothy Sorochkin, Alessio Franci, Naomi Ehrich Leonard

Abstract: For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using… ▽ More For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using a nonlinear opinion dynamics model with dynamic feedback modulation of an attention parameter, we show how the triggering of an opinion cascade and the collective decision itself depend on both the distributed input and the node agreement and disagreement centrality, determined by the spectral properties of the network graph. We further show how the attention dynamics introduce an implicit threshold that distinguishes between distributed inputs that trigger cascades and ones that are rejected as disturbance. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: 7 pages, 4 figures

arXiv:2103.12223 [pdf, other]

doi 10.1007/s11721-021-00190-w

Analysis and control of agreement and disagreement opinion cascades

Authors: Alessio Franci, Anastasia Bizyaeva, Shinkyu Park, Naomi Ehrich Leonard

Abstract: We introduce and analyze a continuous time and state-space model of opinion cascades on networks of large numbers of agents that form opinions about two or more options. By leveraging our recent results on the emergence of agreement and disagreement states, we introduce novel tools to analyze and control agreement and disagreement opinion cascades. New notions of agreement and disagreement central… ▽ More We introduce and analyze a continuous time and state-space model of opinion cascades on networks of large numbers of agents that form opinions about two or more options. By leveraging our recent results on the emergence of agreement and disagreement states, we introduce novel tools to analyze and control agreement and disagreement opinion cascades. New notions of agreement and disagreement centrality, which depend only on network structure, are shown to be key to characterizing the nonlinear behavior of agreement and disagreement opinion formation and cascades. Our results are relevant for the analysis and control of opinion cascades in real-world networks, including biological, social and artificial networks, and for the design of opinion-forming behaviors in robotic swarms. We illustrate an application of our model to a multi-robot task-allocation problem and discuss extensions and future directions opened by our modeling framework. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2011.07720 [pdf, other]

Distributed Bandits: Probabilistic Communication on $d$-regular Graphs

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. Afte… ▽ More We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations. △ Less

Submitted 8 October, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

arXiv:2011.05927 [pdf, other]

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Authors: Udari Madhushani, Biswadip Dey, Naomi Ehrich Leonard, Amit Chakraborty

Abstract: Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant cha… ▽ More Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications. △ Less

Submitted 28 March, 2022; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2009.13600 [pdf, other]

doi 10.23919/ACC50511.2021.9482811

Patterns of Nonlinear Opinion Formation on Networks

Authors: Anastasia Bizyaeva, Ayanna Matthews, Alessio Franci, Naomi Ehrich Leonard

Abstract: When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement… ▽ More When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement or disagreement. We further show how the corresponding eigenvector centrality, as well as any symmetry in the network, informs the resulting patterns of opinion formation and agent sensitivity to input that triggers opinion cascades. △ Less

Submitted 26 March, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 6 pages, 4 figures; accepted to appear in 2021 American Control Conference proceedings

arXiv:2009.04332 [pdf, other]

doi 10.1109/TAC.2022.3159527

Nonlinear Opinion Dynamics with Tunable Sensitivity

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network… ▽ More We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network, these behaviors include multistable agreement and disagreement, tunable sensitivity to input, robustness to disturbance, flexible transition between patterns of opinions, and opinion cascades. We derive network-dependent tuning rules to robustly control the system behavior and we design state-feedback dynamics for the model parameters to make the behavior adaptive to changing external conditions.} The model provides new means for systematic study of dynamics on natural and engineered networks, from information spread and political polarization to collective decision making and dynamic task allocation. △ Less

Submitted 30 July, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2009.01339 [pdf, other]

Heterogeneous Explore-Exploit Strategies on Multi-Star Networks

Authors: Udari Madhushani, Naomi Leonard

Abstract: We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problem… ▽ More We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e. when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds. △ Less

Submitted 1 December, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

arXiv:2008.04383 [pdf, other]

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

Authors: Yaofeng Desmond Zhong, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish sign… ▽ More The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish signals from different layers: an agent becomes active if a sufficient number of its neighbors in each of any $a$ of the $m$ layers is active. We focus on Protocol OR, when $a=1$, and Protocol AND, when $a=m$, which model agents that are most and least readily activated, respectively. We develop theory and algorithms to compute the size of the spread at steady state for any set of initially active agents and to analyze the role of distinguished sensing modalities, network structure, and heterogeneity. We show how heterogeneity manages the tension in spreading dynamics between sensitivity to inputs and robustness to disturbances. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.01424 [pdf, ps, other]

Active Control and Sustained Oscillations in actSIS Epidemic Dynamics

Authors: Yunxiu Zhou, Simon A. Levin, Naomi E. Leonard

Abstract: An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively… ▽ More An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively modify their contact rates with others in response to observations of infection levels in the population. Accounting for a time lag in observations and categorizing individuals into distinct sub-populations based on their risk profiles, we show that the actSIS model manifests qualitatively different features as compared with the SIS model. In a homogeneous population of risk-averters, the endemic equilibrium is always reduced, although the transient infection level can exhibit overshoot or undershoot. In a homogeneous population of risk-tolerating individuals, the system exhibits bistability, which can also lead to reduced infection. For a heterogeneous population comprised of risk-tolerators and risk-averters, we prove conditions on model parameters for the existence of a Hopf bifurcation and sustained oscillations in the infected population. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2004.06171 [pdf, other]

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full commu… ▽ More We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:2004.03793 [pdf, other]

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it rece… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:2003.01312 [pdf, other]

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independen… ▽ More We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph. △ Less

Submitted 11 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1909.11852 [pdf, other]

doi 10.1109/CDC40024.2019.9029844

A Continuous Threshold Model of Cascade Dynamics

Authors: Yaofeng Desmond Zhong, Naomi Ehrich Leonard

Abstract: We present a continuous threshold model (CTM) of cascade dynamics for a network of agents with real-valued activity levels that change continuously in time. The model generalizes the linear threshold model (LTM) from the literature, where an agent becomes active (adopts an innovation) if the fraction of its neighbors that are active is above a threshold. With the CTM we study the influence on casc… ▽ More We present a continuous threshold model (CTM) of cascade dynamics for a network of agents with real-valued activity levels that change continuously in time. The model generalizes the linear threshold model (LTM) from the literature, where an agent becomes active (adopts an innovation) if the fraction of its neighbors that are active is above a threshold. With the CTM we study the influence on cascades of heterogeneity in thresholds for a network comprised of a chain of three clusters of agents, each distinguished by a different threshold. The system is most sensitive to change as the dynamics pass through a bifurcation point: if the bifurcation is supercritical the response will be contained, while if the bifurcation is subcritical the response will be a cascade. We show that there is a subcritical bifurcation, thus a cascade, in response to an innovation if there is a large enough disparity between the thresholds of sufficiently large clusters on either end of the chain; otherwise the response will be contained. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.05765 [pdf, other]

A model-independent theory of consensus and dissensus decision making

Authors: Alessio Franci, Martin Golubitsky, Anastasia Bizyaeva, Naomi Ehrich Leonard

Abstract: We develop a model-independent framework to study the dynamics of decision-making in opinion networks for an arbitrary number of agents and an arbitrary number of options. Model-independence means that the analysis is not performed on a specific set of equations, in contrast to classical approaches to decision making that fix a specific model and analyze it. Rather, the general features of decisio… ▽ More We develop a model-independent framework to study the dynamics of decision-making in opinion networks for an arbitrary number of agents and an arbitrary number of options. Model-independence means that the analysis is not performed on a specific set of equations, in contrast to classical approaches to decision making that fix a specific model and analyze it. Rather, the general features of decision making in dynamical opinion networks can be derived starting from empirically testable hypotheses about the deciding agents, the available options, and the interactions among them. After translating these empirical hypotheses into algebraic ones, we use the tools of equivariant bifurcation theory to uncover model-independent properties of dynamical opinion networks. The model-independent results are illustrated on a novel analytical model that is constructed by plugging a generic sigmoidal nonlinearity, modeling boundedness of opinions and opinion perception, into the model-independent equivariant structure. Our analysis reveals richer and more flexible opinion-formation behavior as compared to model-dependent approaches. For instance, analysis reveals the possibility of switching between consensus and various forms of dissensus by modulation of the level of agent cooperativity and without requiring any particular ad-hoc interaction topology (e.g., structural balance). From a theoretical viewpoint, we prove new results in equivariant bifurcation theory. We construct an exhaustive list of axial subgroups for the action of $\ES_n \times \ES_3$ on $\R^{n-1}\otimes\R^{2}$. We also generalize this list to the action of $\ES_n \times \ES_k$ on $\R^{n-1}\otimes \R^{k-1}$, i.e., for $n$ agents and $k$ options, although without proving that in this case the list is exhaustive. △ Less

Submitted 8 September, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

arXiv:1907.08829 [pdf, other]

Adaptive Susceptibility and Heterogeneity in Contagion Models on Networks

Authors: Renato Pagliara, Naomi E. Leonard

Abstract: Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either re… ▽ More Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either return to its original susceptible state or acquire full immunity to reinfection. We study the network SIRI (Susceptible-Infected-Recovered-Infected) model, an epidemic model for the spread of contagious processes on a network of heterogeneous agents that can adapt their susceptibility to reinfection. The model generalizes existing models to accommodate realistic conditions in which agents acquire partial or compromised immunity after first exposure to an infection. We prove necessary and sufficient conditions on model parameters and network structure that distinguish four dynamic regimes: infection-free, epidemic, endemic, and bistable. For the bistable regime, which is not accounted for in traditional models, we show how there can be a rapid resurgent epidemic after what looks like convergence to an infection-free population. We use the model and its predictive capability to show how control strategies can be designed to mitigate problematic contagious behaviors. △ Less

Submitted 11 April, 2020; v1 submitted 20 July, 2019; originally announced July 2019.

Comments: 14 pages, 5 figures

arXiv:1905.08731 [pdf, other]

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1812.07117 [pdf, other]

doi 10.1080/03080188.2018.1544806

Social decision-making driven by artistic explore-exploit tension

Authors: Kayhan Ozcimder, Biswadip Dey, Alessio Franci, Rebecca Lazier, Daniel Trueman, Naomi Ehrich Leonard

Abstract: We studied social decision-making in the rule-based improvisational dance $There$ $Might$ $Be$ $Others$, where dancers make in-the-moment compositional choices. Rehearsals provided a natural test-bed with communication restricted to non-verbal cues. We observed a key artistic explore-exploit tension in which the dancers switched between exploitation of existing artistic opportunities and riskier e… ▽ More We studied social decision-making in the rule-based improvisational dance $There$ $Might$ $Be$ $Others$, where dancers make in-the-moment compositional choices. Rehearsals provided a natural test-bed with communication restricted to non-verbal cues. We observed a key artistic explore-exploit tension in which the dancers switched between exploitation of existing artistic opportunities and riskier exploration of new ones. We investigated how the rules influenced the dynamics using rehearsals together with a model generalized from evolutionary dynamics. We tuned the rules to heighten the tension and modeled nonlinear fitness and feedback dynamics for mutation rate to capture the observed temporal phasing of the dancers' exploration-versus-exploitation. Using bifurcation analysis, we identified key controls of the tension and showed how they could shape the decision-making dynamics of the model much like turning a "dial" in the instructions to the dancers could shape the dance. The investigation became an integral part of the development of the dance. △ Less

Submitted 17 December, 2018; originally announced December 2018.

Journal ref: K. Ozcimder, B. Dey, A. Franci, R. Lazier, D. Trueman, and N. E. Leonard (2018): Social decision-making driven by artistic explore-exploit tension, Interdisciplinary Science Reviews

arXiv:1807.10824 [pdf, other]

doi 10.1063/1.5050178

Mixed mode oscillations and phase locking in coupled FitzHugh-Nagumo model neurons

Authors: Elizabeth N. Davison, Zahra Aminzare, Biswadip Dey, Naomi Ehrich Leonard

Abstract: We study the dynamics of a low-dimensional system of coupled model neurons as a step towards understanding the vastly complex network of neurons in the brain. We analyze the bifurcation structure of a system of two model neurons with unidirectional coupling as a function of two physiologically relevant parameters: the external current input only to the first neuron and the strength of the coupling… ▽ More We study the dynamics of a low-dimensional system of coupled model neurons as a step towards understanding the vastly complex network of neurons in the brain. We analyze the bifurcation structure of a system of two model neurons with unidirectional coupling as a function of two physiologically relevant parameters: the external current input only to the first neuron and the strength of the coupling from the first to the second neuron. Leveraging a timescale separation, we prove necessary conditions for multiple timescale phenomena observed in the coupled system, including canard solutions and mixed mode oscillations. For a larger network of model neurons, we present a sufficient condition for phase locking when external inputs are heterogeneous. Finally, we generalize our results to directed trees of model neurons with heterogeneous inputs. △ Less

Submitted 27 July, 2018; originally announced July 2018.

MSC Class: 34A26; 34C15; 34C60; 34D15; 34E17; 37G05; 37G10; 92B20; 92C20

arXiv:1711.11578 [pdf, other]

Multi-agent decision-making dynamics inspired by honeybees

Authors: Rebecca Gray, Alessio Franci, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: When choosing between candidate nest sites, a honeybee swarm reliably chooses the most valuable site and even when faced with the choice between near-equal value sites, it makes highly efficient decisions. Value-sensitive decision-making is enabled by a distributed social effort among the honeybees, and it leads to decision-making dynamics of the swarm that are remarkably robust to perturbation an… ▽ More When choosing between candidate nest sites, a honeybee swarm reliably chooses the most valuable site and even when faced with the choice between near-equal value sites, it makes highly efficient decisions. Value-sensitive decision-making is enabled by a distributed social effort among the honeybees, and it leads to decision-making dynamics of the swarm that are remarkably robust to perturbation and adaptive to change. To explore and generalize these features to other networks, we design distributed multi-agent network dynamics that exhibit a pitchfork bifurcation, ubiquitous in biological models of decision-making. Using tools of nonlinear dynamics we show how the designed agent-based dynamics recover the high performing value-sensitive decision-making of the honeybees and rigorously connect investigation of mechanisms of animal group decision-making to systematic, bio-inspired control of multi-agent network systems. We further present a distributed adaptive bifurcation control law and prove how it enhances the network decision-making performance beyond that observed in swarms. △ Less

Submitted 22 January, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

arXiv:1606.00911 [pdf, other]

Distributed Cooperative Decision-Making in Multiarmed Bandits: Frequentist and Bayesian Algorithms

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for eac… ▽ More We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for each agent's estimation of mean rewards from its own rewards and the estimated rewards of its neighbors. We prove the performance of these algorithms and show that they asymptotically recover the performance of a centralized agent. Further, we rigorously characterize the influence of the communication graph structure on the decision-making performance of the group. △ Less

Submitted 17 September, 2019; v1 submitted 2 June, 2016; originally announced June 2016.

Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 IEEE Conference on Decision and Control (CDC). The second statement of Proposition 1 and Theorem 1 are new from arXiv:1512.06888v3 and Lemma 1 is new. These are used to prove regret bounds in Theorems 2 and 3

arXiv:1512.07638 [pdf, other]

Satisficing in multi-armed bandit problems

Authors: Paul Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximi… ▽ More Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance. △ Less

Submitted 19 December, 2016; v1 submitted 23 December, 2015; originally announced December 2015.

Comments: To appear in IEEE Transactions on Automatic Control

arXiv:1512.06888 [pdf, other]

On Distributed Cooperative Decision-Making in Multiarmed Bandits

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection… ▽ More We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group. △ Less

Submitted 16 September, 2019; v1 submitted 21 December, 2015; originally announced December 2015.

Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 European Control Conference (ECC). The second statement of Proposition 1, Theorem 1 and their proofs are new. The new Theorem 1 is used to prove the regret bounds in Theorem 2

arXiv:1508.03373 [pdf, other]

A martingale analysis of first passage times of time-dependent Wiener diffusion models

Authors: Vaibhav Srivastava, Samuel F. Feng, Jonathan D. Cohen, Naomi Ehrich Leonard, Amitai Shenhav

Abstract: Research in psychology and neuroscience has successfully modeled decision making as a process of noisy evidence accumulation to a decision bound. While there are several variants and implementations of this idea, the majority of these models make use of a noisy accumulation between two absorbing boundaries. A common assumption of these models is that decision parameters, e.g., the rate of accumula… ▽ More Research in psychology and neuroscience has successfully modeled decision making as a process of noisy evidence accumulation to a decision bound. While there are several variants and implementations of this idea, the majority of these models make use of a noisy accumulation between two absorbing boundaries. A common assumption of these models is that decision parameters, e.g., the rate of accumulation (drift rate), remain fixed over the course of a decision, allowing the derivation of analytic formulas for the probabilities of hitting the upper or lower decision threshold, and the mean decision time. There is reason to believe, however, that many types of behavior would be better described by a model in which the parameters were allowed to vary over the course of the decision process. In this paper, we use martingale theory to derive formulas for the mean decision time, hitting probabilities, and first passage time (FPT) densities of a Wiener process with time-varying drift between two time-varying absorbing boundaries. This model was first studied by Ratcliff (1980) in the two-stage form, and here we consider the same model for an arbitrary number of stages (i.e. intervals of time during which parameters are constant). Our calculations enable direct computation of mean decision times and hitting probabilities for the associated multistage process. We also provide a review of how martingale theory may be used to analyze similar models employing Wiener processes by re-deriving some classical results. In concert with a variety of numerical tools already available, the current derivations should encourage mathematical analysis of more complex models of decision making with time-varying evidence. △ Less

Submitted 30 September, 2016; v1 submitted 13 August, 2015; originally announced August 2015.

arXiv:1507.01160 [pdf, other]

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

Authors: Vaibhav Srivastava, Paul Reverdy, Naomi Ehrich Leonard

Abstract: We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence,… ▽ More We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance. △ Less

Submitted 7 July, 2015; v1 submitted 4 July, 2015; originally announced July 2015.

arXiv:1503.08526 [pdf, other]

A Realization Theory for Bio-inspired Collective Decision-Making

Authors: Alessio Franci, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: The collective decision-making exhibited by animal groups provides enormous inspiration for multi-agent control system design as it embodies several features that are desirable in engineered networks, including robustness and adaptability, low computational effort, and an intrinsically decentralized architecture. However, many of the mechanistic models for collective decision-making are described… ▽ More The collective decision-making exhibited by animal groups provides enormous inspiration for multi-agent control system design as it embodies several features that are desirable in engineered networks, including robustness and adaptability, low computational effort, and an intrinsically decentralized architecture. However, many of the mechanistic models for collective decision-making are described at the population-level abstraction and are challenging to implement in an engineered system. We develop simple and easy-to-implement models of opinion dynamics that realize the empirically observed collective decision-making behavior as well as the behavior predicted by existing models of animal groups. Using methods from Lyapunov analysis, singularity theory, and monotone dynamical systems, we rigorously investigate the steady-state decision-making behavior of our models. △ Less

Submitted 30 November, 2017; v1 submitted 29 March, 2015; originally announced March 2015.

arXiv:1502.04635 [pdf, other]

Parameter estimation in softmax decision-making models with linear objective functions

Authors: Paul Reverdy, Naomi E. Leonard

Abstract: With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditio… ▽ More With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditions under which the likelihood function is convex. These allow us to provide sufficient conditions for convergence of the resulting maximum likelihood estimator and to construct its asymptotic distribution. In the case of models with nonlinear objective functions, we show how the estimator can be applied by linearizing about a nominal parameter value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit) model of human decision-making to human subject data. We show statistically significant differences in behavior across related, but distinct, tasks. △ Less

Submitted 29 August, 2015; v1 submitted 16 February, 2015; originally announced February 2015.

Comments: In press

MSC Class: 93E10

arXiv:1407.1569 [pdf, other]

Joint Centrality Distinguishes Optimal Leaders in Noisy Networks

Authors: Katherine E. Fitch, Naomi Ehrich Leonard

Abstract: We study the performance of a network of agents tasked with tracking an external unknown signal in the presence of stochastic disturbances and under the condition that only a limited subset of agents, known as leaders, can measure the signal directly. We investigate the optimal leader selection problem for a prescribed maximum number of leaders, where the optimal leader set minimizes total system… ▽ More We study the performance of a network of agents tasked with tracking an external unknown signal in the presence of stochastic disturbances and under the condition that only a limited subset of agents, known as leaders, can measure the signal directly. We investigate the optimal leader selection problem for a prescribed maximum number of leaders, where the optimal leader set minimizes total system error defined as steady-state variance about the external signal. In contrast to previously established greedy algorithms for optimal leader selection, our results rely on an expression of total system error in terms of properties of the underlying network graph. We demonstrate that the performance of any given set of leaders depends on their influence as determined by a new graph measure of centrality of a set. We define the $joint \; centrality$ of a set of nodes in a network graph such that a leader set with maximal joint centrality is an optimal leader set. In the case of a single leader, we prove that the optimal leader is the node with maximal information centrality. In the case of multiple leaders, we show that the nodes in the optimal leader set balance high information centrality with a coverage of the graph. For special cases of graphs, we solve explicitly for optimal leader sets. We illustrate with examples. △ Less

Submitted 15 June, 2015; v1 submitted 6 July, 2014; originally announced July 2014.

Comments: Conditionally accepted to IEEE TCNS

arXiv:1402.3634 [pdf, other]

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

Authors: Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free… ▽ More We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free response paradigm in which individuals take their time to make the decision. We develop reduced DDMs as decoupled approximations to the coupled DDM and characterize their efficiency. We determine high probability bounds on the error rate and the expected decision time for the reduced DDM. We show the effect of the decision-maker's location in the network on their decision-making performance under several threshold selection criteria. Finally, we extend the coupled DDM to the coupled Ornstein-Uhlenbeck model for decision-making in two alternative choice tasks with recency effects, and to the coupled race model for decision-making in multiple alternative choice tasks. △ Less

Submitted 14 February, 2014; originally announced February 2014.

Comments: to appear in IEEE TCNS

arXiv:1310.5168 [pdf, other]

A New Notion of Effective Resistance for Directed Graphs-Part II: Computing Resistances

Authors: George Forrest Young, Luca Scardovi, Naomi Ehrich Leonard

Abstract: In Part I of this work we defined a generalization of the concept of effective resistance to directed graphs, and we explored some of the properties of this new definition. Here, we use the theory developed in Part I to compute effective resistances in some prototypical directed graphs. This exploration highlights cases where our notion of effective resistance for directed graphs behaves analogous… ▽ More In Part I of this work we defined a generalization of the concept of effective resistance to directed graphs, and we explored some of the properties of this new definition. Here, we use the theory developed in Part I to compute effective resistances in some prototypical directed graphs. This exploration highlights cases where our notion of effective resistance for directed graphs behaves analogously to our experience from undirected graphs, as well as cases where it behaves in unexpected ways. △ Less

Submitted 21 October, 2013; v1 submitted 18 October, 2013; originally announced October 2013.

arXiv:1310.5163 [pdf, other]

A New Notion of Effective Resistance for Directed Graphs-Part I: Definition and Properties

Authors: George Forrest Young, Luca Scardovi, Naomi Ehrich Leonard

Abstract: The graphical notion of effective resistance has found wide-ranging applications in many areas of pure mathematics, applied mathematics and control theory. By the nature of its construction, effective resistance can only be computed in undirected graphs and yet in several areas of its application, directed graphs arise as naturally (or more naturally) than undirected ones. In part I of this work,… ▽ More The graphical notion of effective resistance has found wide-ranging applications in many areas of pure mathematics, applied mathematics and control theory. By the nature of its construction, effective resistance can only be computed in undirected graphs and yet in several areas of its application, directed graphs arise as naturally (or more naturally) than undirected ones. In part I of this work, we propose a generalization of effective resistance to directed graphs that preserves its control-theoretic properties in relation to consensus-type dynamics. We proceed to analyze the dependence of our algebraic definition on the structural properties of the graph and the relationship between our construction and a graphical distance. The results make possible the calculation of effective resistance between any two nodes in any directed graph and provide a solid foundation for the application of effective resistance to problems involving directed graphs. △ Less

Submitted 21 October, 2013; v1 submitted 18 October, 2013; originally announced October 2013.

arXiv:1310.4188 [pdf, other]

Nonuniform Line Coverage from Noisy Scalar Measurements

Authors: P. Davison, N. E. Leonard, A. Olshevsky, M. Schwemmer

Abstract: We study the problem of distributed coverage control in a network of mobile agents arranged on a line. The goal is to design distributed dynamics for the agents to achieve optimal coverage positions with respect to a scalar density field that measures the relative importance of each point on the line. Unlike previous work, which has implicitly assumed the agents know this density field, we only as… ▽ More We study the problem of distributed coverage control in a network of mobile agents arranged on a line. The goal is to design distributed dynamics for the agents to achieve optimal coverage positions with respect to a scalar density field that measures the relative importance of each point on the line. Unlike previous work, which has implicitly assumed the agents know this density field, we only assume that each agent can access noisy samples of the field at points close to its current location. We provide a simple randomized protocol wherein every agent samples the scalar field at three nearby points at each step and which guarantees convergence to the optimal positions. We further analyze the convergence time of this protocol and show that, under suitable assumptions, the squared distance to the optimal coverage configuration decays as $O(1/t)$ with the number of iterations $t$, where the constant scales polynomially with the number of agents $n$. We illustrate these results with simulations. △ Less

Submitted 21 November, 2014; v1 submitted 15 October, 2013; originally announced October 2013.

arXiv:1307.6134 [pdf, other]

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

Authors: Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard

Abstract: We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewar… ▽ More We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sub-logarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples. NB: Appendix G included in this version details minor modifications that correct for an oversight in the previously-published proofs. The remainder of the text reflects the published work. △ Less

Submitted 20 December, 2019; v1 submitted 23 July, 2013; originally announced July 2013.

Comments: 25 pages. Appendix G included in this version details minor modifications that correct for an oversight in the previously-published proofs. The remainder of the text reflects the previously-published version

Journal ref: Proceedings of the IEEE, vol. 102, iss. 4, p. 544-571, 2014

arXiv:1210.4235 [pdf, ps, other]

Node Classification in Networks of Stochastic Evidence Accumulators

Authors: Ioannis Poulakakis, Luca Scardovi, Naomi Ehrich Leonard

Abstract: This paper considers a network of stochastic evidence accumulators, each represented by a drift-diffusion model accruing evidence towards a decision in continuous time by observing a noisy signal and by exchanging information with other units according to a fixed communication graph. We bring into focus the relationship between the location of each unit in the communication graph and its certainty… ▽ More This paper considers a network of stochastic evidence accumulators, each represented by a drift-diffusion model accruing evidence towards a decision in continuous time by observing a noisy signal and by exchanging information with other units according to a fixed communication graph. We bring into focus the relationship between the location of each unit in the communication graph and its certainty as measured by the inverse of the variance of its state. We show that node classification according to degree distributions or geodesic distances cannot faithfully capture node ranking in terms of certainty. Instead, all possible paths connecting each unit with the rest in the network must be incorporated. We make this precise by proving that node classification according to information centrality provides a rank ordering with respect to node certainty, thereby affording a direct interpretation of the certainty level of each unit in terms of the structural properties of the underlying communication graph. △ Less

Submitted 15 October, 2012; originally announced October 2012.

Comments: 32 pages

arXiv:1209.2194 [pdf, ps, other]

Cooperative learning in multi-agent systems from intermittent measurements

Authors: Naomi Ehrich Leonard, Alex Olshevsky

Abstract: Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distribute… ▽ More Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distributed and able to cope with the time-varying, unpredictable, and noisy nature of inter-agent communication, and intermittent noisy measurements of $μ$. Our main result bounds the learning speed of our protocol in terms of the size and combinatorial features of the (time-varying) networks connecting the nodes. △ Less

Submitted 15 December, 2014; v1 submitted 10 September, 2012; originally announced September 2012.

arXiv:1105.2541 [pdf, other]

Rearranging trees for robust consensus

Authors: George Forrest Young, Luca Scardovi, Naomi Ehrich Leonard

Abstract: In this paper, we use the H2 norm associated with a communication graph to characterize the robustness of consensus to noise. In particular, we restrict our attention to trees and by systematic attention to the effect of local changes in topology, we derive a partial ordering for undirected trees according to the H2 norm. Our approach for undirected trees provides a constructive method for derivin… ▽ More In this paper, we use the H2 norm associated with a communication graph to characterize the robustness of consensus to noise. In particular, we restrict our attention to trees and by systematic attention to the effect of local changes in topology, we derive a partial ordering for undirected trees according to the H2 norm. Our approach for undirected trees provides a constructive method for deriving an ordering for directed trees. Further, our approach suggests a decentralized manner in which trees can be rearranged in order to improve their robustness. △ Less

Submitted 20 June, 2011; v1 submitted 12 May, 2011; originally announced May 2011.

Comments: Submitted to CDC 2011

arXiv:1104.0457 [pdf, other]

Nonuniform Coverage Control on the Line

Authors: Naomi Ehrich Leonard, Alex Olshevsky

Abstract: This paper investigates control laws allowing mobile, autonomous agents to optimally position themselves on the line for distributed sensing in a nonuniform field. We show that a simple static control law, based only on local measurements of the field by each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/movement/computation rounds… ▽ More This paper investigates control laws allowing mobile, autonomous agents to optimally position themselves on the line for distributed sensing in a nonuniform field. We show that a simple static control law, based only on local measurements of the field by each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/movement/computation rounds that is essentially quadratic in the number of agents. Further, we exhibit a dynamic control law which, under slightly stronger assumptions on the capabilities and knowledge of each agent, drives the agents close to the optimal positions after the agents execute in parallel a number of sensing/communication/computation/movement rounds that is essentially linear in the number of agents. Crucially, both algorithms are fully distributed and robust to unpredictable loss and addition of agents. △ Less

Submitted 7 November, 2012; v1 submitted 3 April, 2011; originally announced April 2011.

arXiv:0902.3710 [pdf, other]

Tensegrity Models and Shape Control of Vehicle Formations

Authors: Benjamin Nabet, Naomi Ehrich Leonard

Abstract: Using dynamic models of tensegrity structures, we derive provable, distributed control laws for stabilizing and changing the shape of a formation of vehicles in the plane. Tensegrity models define the desired, controlled, multi-vehicle system dynamics, where each node in the tensegrity structure maps to a vehicle and each interconnecting strut or cable in the structure maps to a virtual intercon… ▽ More Using dynamic models of tensegrity structures, we derive provable, distributed control laws for stabilizing and changing the shape of a formation of vehicles in the plane. Tensegrity models define the desired, controlled, multi-vehicle system dynamics, where each node in the tensegrity structure maps to a vehicle and each interconnecting strut or cable in the structure maps to a virtual interconnection between vehicles. Our method provides a smooth map from any desired planar formation shape to a planar tensegrity structure. The stabilizing vehicle formation shape control laws are then given by the forces between nodes in the corresponding tensegrity model. The smooth map makes possible provably well behaved changes of formation shape over a prescribed time interval. A designed path in shape space is mapped to a path in the parametrized space of tensegrity structures and the vehicle formation tracks this path with forces derived from the time-varying tensegrity model. By means of examples, we illustrate the influence of design parameters on performance measures. △ Less

Submitted 23 February, 2009; originally announced February 2009.

Comments: 31 pages, 6 figures, Submitted

arXiv:0806.3442 [pdf, other]

Stabilization of Three-Dimensional Collective Motion

Authors: Luca Scardovi, Naomi Leonard, Rodolphe Sepulchre

Abstract: This paper proposes a methodology to stabilize relative equilibria in a model of identical, steered particles moving in three-dimensional Euclidean space. Exploiting the Lie group structure of the resulting dynamical system, the stabilization problem is reduced to a consensus problem on the Lie algebra. The resulting equilibria correspond to parallel, circular and helical formations. We first de… ▽ More This paper proposes a methodology to stabilize relative equilibria in a model of identical, steered particles moving in three-dimensional Euclidean space. Exploiting the Lie group structure of the resulting dynamical system, the stabilization problem is reduced to a consensus problem on the Lie algebra. The resulting equilibria correspond to parallel, circular and helical formations. We first derive the stabilizing control laws in the presence of all-to-all communication. Providing each agent with a consensus estimator, we then extend the results to a general setting that allows for unidirectional and time-varying communication topologies. △ Less

Submitted 21 June, 2008; v1 submitted 20 June, 2008; originally announced June 2008.

Comments: 15 pages, 4 figures, Submitted

arXiv:math/0205017 [pdf, ps, other]

Singular trajectories in multi-input time-optimal problems: Application to controlled mechanical systems

Authors: M. Chyba, N. E. Leonard, E. D. Sontag

Abstract: This paper addresses the time-optimal control problem for a class of control systems which includes controlled mechanical systems with possible dissipation terms. The Lie algebras associated with such mechanical systems enjoy certain special properties. These properties are explored and are used in conjunction with the Pontryagin maximum principle to determine the structure of singular extremals… ▽ More This paper addresses the time-optimal control problem for a class of control systems which includes controlled mechanical systems with possible dissipation terms. The Lie algebras associated with such mechanical systems enjoy certain special properties. These properties are explored and are used in conjunction with the Pontryagin maximum principle to determine the structure of singular extremals and, in particular, time-optimal trajectories. The theory is illustrated with an application to a time-optimal problem for a class of underwater vehicles △ Less

Submitted 2 May, 2002; originally announced May 2002.

Comments: See http://www.math.rutgers.edu/~sontag for related work

MSC Class: 93B05;57R27;37N05

Showing 1–50 of 50 results for author: Leonard, N