Search | arXiv e-print repository

arXiv:2406.09810 [pdf, other]

Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

Authors: Haimin Hu, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is… ▽ More Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is yet to be fully addressed. The recently developed nonlinear opinion dynamics (NOD) show promise in enabling fast opinion formation and avoiding safety-critical deadlocks. However, it remains an open challenge to determine the model parameters of NOD automatically and adaptively, accounting for the ever-changing environment of interaction. In this work, we propose for the first time a learning-based, game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. The learned NOD can be used by existing dynamic game solvers to plan decisively while accounting for the predicted change of other agents' intents, thus enabling situational awareness in planning. We demonstrate Neural NOD's ability to make fast and robust decisions in a simulated autonomous racing example, leading to tangible improvements in safety and overtaking performance over state-of-the-art data-driven game-theoretic planning methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.20593 [pdf, other]

Excitable crawling

Authors: Juncal Arbelaiz, Alessio Franci, Naomi Ehrich Leonard, Rodolphe Sepulchre, Bassam Bamieh

Abstract: We propose and analyze the suitability of a spiking controller to engineer the locomotion of a soft robotic crawler. Inspired by the FitzHugh-Nagumo model of neural excitability, we design a bistable controller with an electrical flipflop circuit representation capable of generating spikes on-demand when coupled to the passive crawler mechanics. A proprioceptive sensory signal from the crawler mec… ▽ More We propose and analyze the suitability of a spiking controller to engineer the locomotion of a soft robotic crawler. Inspired by the FitzHugh-Nagumo model of neural excitability, we design a bistable controller with an electrical flipflop circuit representation capable of generating spikes on-demand when coupled to the passive crawler mechanics. A proprioceptive sensory signal from the crawler mechanics turns bistability of the controller into a rhythmic spiking. The output voltage, in turn, activates the crawler's actuators to generate movement through peristaltic waves. We show through geometric analysis that this control strategy achieves endogenous crawling. The electro-mechanical sensorimotor interconnection provides embodied negative feedback regulation, facilitating locomotion. Dimensional analysis provides insights on the characteristic scales in the crawler's mechanical and electrical dynamics, and how they determine the crawling gait. Adaptive control of the electrical scales to optimally match the mechanical scales can be envisioned to achieve further efficiency, as in homeostatic regulation of neuronal circuits. Our approach can scale up to multiple sensorimotor loops inspired by biological central pattern generators. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 5 pages, MTNS 2024 extended abstract

arXiv:2312.06395 [pdf, other]

Threshold Decision-Making Dynamics Adaptive to Physical Constraints and Changing Environment

Authors: Giovanna Amorim, María Santos, Shinkyu Park, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by me… ▽ More We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by means of the bifurcation behavior how the coupled dynamics are adaptive to the physical constraints of the agent. We also show how the bifurcation can be modulated to allow the agent to switch tasks based on thresholds adaptive to environmental conditions. We illustrate the benefits of the approach through a decentralized multi-robot task allocation application for trash collection. △ Less

Submitted 7 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2308.14666 [pdf, other]

Learning to Predict 3D Rotational Dynamics from Images of a Rigid Body with Unknown Mass Distribution

Authors: Justice Mason, Christine Allen-Blanchette, Nicholas Zolman, Elizabeth Davison, Naomi Ehrich Leonard

Abstract: In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribut… ▽ More In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribution of mass inside the body, which, together with initial angular velocity, is what determines how the body will rotate. We present a physics-based neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion. We demonstrate the efficacy of our approach on new rotating rigid-body datasets of sequences of synthetic images of rotating objects, including cubes, prisms and satellites, with unknown uniform and non-uniform mass distributions. Our model outperforms competing baselines on our datasets, producing better qualitative predictions and reducing the error observed for the state-of-the-art Hamiltonian Generative Network by a factor of 2. △ Less

Submitted 10 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Previously appeared as arXiv:2209.11355v2, which was submitted as a replacement by accident. arXiv admin note: text overlap with arXiv:2209.11355

arXiv:2308.02755 [pdf, other]

Multi-topic belief formation through bifurcations over signed social networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that strong beliefs emerge on the network as a consequence of a bif… ▽ More We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that strong beliefs emerge on the network as a consequence of a bifurcation. We analyze how the balance of social network effects in the model controls the nature of the bifurcation and, therefore, the belief-forming limit-set solutions. Our analysis provides constructive conditions on how multi-stable network belief equilibria and belief oscillations emerging at a belief-forming bifurcation depend on the communication network graph and belief system network graph. Our model and analysis provide new theoretical insights on the dynamics of social systems and a new principled framework for designing decentralized decision-making on engineered networks in the presence of structured relationships among alternatives. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 16 pages, 7 figures

arXiv:2304.02687 [pdf, other]

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to uns… ▽ More We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2210.01642 [pdf, other]

Proactive Opinion-Driven Robot Navigation around Human Movers

Authors: Charlotte Cathcart, María Santos, Shinkyu Park, Naomi Ehrich Leonard

Abstract: We propose, analyze, and experimentally verify a new proactive approach for robot social navigation driven by the robot's "opinion" for which way and by how much to pass human movers crossing its path. The robot forms an opinion over time according to nonlinear dynamics that depend on the robot's observations of human movers and its level of attention to these social cues. For these dynamics, it i… ▽ More We propose, analyze, and experimentally verify a new proactive approach for robot social navigation driven by the robot's "opinion" for which way and by how much to pass human movers crossing its path. The robot forms an opinion over time according to nonlinear dynamics that depend on the robot's observations of human movers and its level of attention to these social cues. For these dynamics, it is guaranteed that when the robot's attention is greater than a critical value, deadlock in decision making is broken, and the robot rapidly forms a strong opinion, passing each human mover even if the robot has no bias nor evidence for which way to pass. We enable proactive rapid and reliable social navigation by having the robot grow its attention across the critical value when a human mover approaches. With human-robot experiments we demonstrate the flexibility of our approach and validate our analytical results on deadlock-breaking. We also show that a single design parameter can tune the trade-off between efficiency and reliability in human-robot passing. The new approach has the additional advantage that it does not rely on a predictive model of human behavior. △ Less

Submitted 11 September, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 8 pages, 7 figures

arXiv:2210.00353 [pdf, other]

Sustained oscillations in multi-topic belief dynamics over signed networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude… ▽ More We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude patterns of the oscillations. Additionally, we distinguish broad classes of graphs that exhibit such oscillations from those that do not. △ Less

Submitted 22 March, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 6 pages, 6 figures, accepted for publication in the 2023 American Control Conference proceedings

arXiv:2208.01800 [pdf, other]

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

Authors: Kensuke Nakamura, María Santos, Naomi Ehrich Leonard

Abstract: This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are… ▽ More This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are aggregated locally by establishing communications through the boundaries of a Voronoi partition. We present an algorithm whereby each robot runs a local Gaussian process calculated from its own measurements and those provided by its Voronoi neighbors, which are incorporated into the individual robot's Gaussian process only if they provide sufficiently novel information. The performance of the algorithm is evaluated in simulation and compared with centralized approaches. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Accepted IROS 2022

arXiv:2206.14893 [pdf, other]

Breaking indecision in multi-agent, multi-option dynamics

Authors: Alessio Franci, Martin Golubitsky, Ian Stewart, Anastasia Bizyaeva, Naomi Ehrich Leonard

Abstract: How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is… ▽ More How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is also important because in a fully indecisive state agents are not biased toward any specific option and therefore the agent group is maximally sensitive and prone to adapt to inputs and changes in its environment. Here, we develop a mathematical theory to study how decisions arise from the breaking of indecision. Our approach is grounded in both equivariant and network bifurcation theory. We model decision from indecision as synchrony-breaking in influence networks in which each node is the value assigned by an agent to an option. First, we show that three universal decision behaviors, namely, deadlock, consensus, and dissensus, are the generic outcomes of synchrony-breaking bifurcations from a fully synchronous state of indecision in influence networks. Second, we show that all deadlock and consensus value patterns and some dissensus value patterns are predicted by the symmetry of the influence networks. Third, we show that there are also many `exotic' dissensus value patterns. These patterns are predicted by network architecture, but not by network symmetries, through a new synchrony-breaking branching lemma. This is the first example of exotic solutions in an application. Numerical simulations of a novel influence network model illustrate our theoretical results. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 36 pages

arXiv:2111.12482 [pdf, other]

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Authors: Udari Madhushani, Abhimanyu Dubey, Naomi Ehrich Leonard, Alex Pentland

Abstract: The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative ban… ▽ More The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Journal ref: Conference on Neural Information Processing Systems, 2021

arXiv:2110.07392 [pdf, other]

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Authors: Justin Lidard, Udari Madhushani, Naomi Ehrich Leonard

Abstract: A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transiti… ▽ More A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transition dynamics, in which agents can communicate in a decentralized manner.We show that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its $γ$-hop neighbors. We prove regret and sample complexity bounds that depend on the number of agents, communication network structure and $γ.$ We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy. Numerical simulations illustrate our results and validate our theoretical claims. △ Less

Submitted 2 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper to American Control Conference (ACC) 2022

arXiv:2108.00966 [pdf, other]

Tuning Cooperative Behavior in Games with Nonlinear Opinion Dynamics

Authors: Shinkyu Park, Anastasia Bizyaeva, Mari Kawakatsu, Alessio Franci, Naomi Ehrich Leonard

Abstract: We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate… ▽ More We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate behavior of agents that select strategies using rationality and reciprocity, key features of human decision-making in social dilemmas. For two-strategy games, we use bifurcation analysis to prove conditions for the bistability of two equilibria and conditions for the first (second) equilibrium to reflect all agents favoring the first (second) strategy. We prove how model parameters, e.g., level of attention to opinions of others (reciprocity), network structure, and payoffs, influence dynamics and, notably, the size of the region of attraction to each stable equilibrium. We provide insights by examining the tuning of the bistability of mutual cooperation and mutual defection and their regions of attraction for the repeated prisoner's dilemma and the repeated multi-agent public goods game. Our results generalize to games with more strategies, heterogeneity, and additional feedback dynamics, such as those designed to elicit cooperation. △ Less

Submitted 23 November, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

arXiv:2103.14764 [pdf, ps, other]

doi 10.1109/CDC45484.2021.9683650

Control of Agreement and Disagreement Cascades with Distributed Inputs

Authors: Anastasia Bizyaeva, Timothy Sorochkin, Alessio Franci, Naomi Ehrich Leonard

Abstract: For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using… ▽ More For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using a nonlinear opinion dynamics model with dynamic feedback modulation of an attention parameter, we show how the triggering of an opinion cascade and the collective decision itself depend on both the distributed input and the node agreement and disagreement centrality, determined by the spectral properties of the network graph. We further show how the attention dynamics introduce an implicit threshold that distinguishes between distributed inputs that trigger cascades and ones that are rejected as disturbance. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: 7 pages, 4 figures

arXiv:2011.07720 [pdf, other]

Distributed Bandits: Probabilistic Communication on $d$-regular Graphs

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. Afte… ▽ More We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations. △ Less

Submitted 8 October, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

arXiv:2011.05927 [pdf, other]

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Authors: Udari Madhushani, Biswadip Dey, Naomi Ehrich Leonard, Amit Chakraborty

Abstract: Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant cha… ▽ More Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications. △ Less

Submitted 28 March, 2022; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2010.12932 [pdf, other]

LagNetViP: A Lagrangian Neural Network for Video Prediction

Authors: Christine Allen-Blanchette, Sushant Veer, Anirudha Majumdar, Naomi Ehrich Leonard

Abstract: The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. The equations of motion, as defined by Newton's second law, describe the time evolution of a physical system state and can therefore be applied toward the determination of future system states. In this paper, we… ▽ More The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. The equations of motion, as defined by Newton's second law, describe the time evolution of a physical system state and can therefore be applied toward the determination of future system states. In this paper, we introduce a video prediction model where the equations of motion are explicitly constructed from learned representations of the underlying physical quantities. To achieve this, we simultaneously learn a low-dimensional state representation and system Lagrangian. The kinetic and potential energy terms of the Lagrangian are distinctly modelled and the low-dimensional equations of motion are explicitly constructed using the Euler-Lagrange equations. We demonstrate the efficacy of this approach for video prediction on image sequences rendered in modified OpenAI gym Pendulum-v0 and Acrobot environments. △ Less

Submitted 24 October, 2020; originally announced October 2020.

arXiv:2009.13600 [pdf, other]

doi 10.23919/ACC50511.2021.9482811

Patterns of Nonlinear Opinion Formation on Networks

Authors: Anastasia Bizyaeva, Ayanna Matthews, Alessio Franci, Naomi Ehrich Leonard

Abstract: When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement… ▽ More When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement or disagreement. We further show how the corresponding eigenvector centrality, as well as any symmetry in the network, informs the resulting patterns of opinion formation and agent sensitivity to input that triggers opinion cascades. △ Less

Submitted 26 March, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 6 pages, 4 figures; accepted to appear in 2021 American Control Conference proceedings

arXiv:2009.04332 [pdf, other]

doi 10.1109/TAC.2022.3159527

Nonlinear Opinion Dynamics with Tunable Sensitivity

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network… ▽ More We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network, these behaviors include multistable agreement and disagreement, tunable sensitivity to input, robustness to disturbance, flexible transition between patterns of opinions, and opinion cascades. We derive network-dependent tuning rules to robustly control the system behavior and we design state-feedback dynamics for the model parameters to make the behavior adaptive to changing external conditions.} The model provides new means for systematic study of dynamics on natural and engineered networks, from information spread and political polarization to collective decision making and dynamic task allocation. △ Less

Submitted 30 July, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2008.04383 [pdf, other]

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

Authors: Yaofeng Desmond Zhong, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish sign… ▽ More The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish signals from different layers: an agent becomes active if a sufficient number of its neighbors in each of any $a$ of the $m$ layers is active. We focus on Protocol OR, when $a=1$, and Protocol AND, when $a=m$, which model agents that are most and least readily activated, respectively. We develop theory and algorithms to compute the size of the spread at steady state for any set of initially active agents and to analyze the role of distinguished sensing modalities, network structure, and heterogeneity. We show how heterogeneity manages the tension in spreading dynamics between sensitivity to inputs and robustness to disturbances. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.01926 [pdf, other]

Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control

Authors: Yaofeng Desmond Zhong, Naomi Ehrich Leonard

Abstract: Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model tha… ▽ More Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers. △ Less

Submitted 31 August, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: This version corrects an error in Equation (3) of the 2020 NeurIPS Proceedings paper

arXiv:2004.06171 [pdf, other]

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full commu… ▽ More We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:2004.03793 [pdf, other]

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it rece… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:2003.01312 [pdf, other]

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independen… ▽ More We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph. △ Less

Submitted 11 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1907.08829 [pdf, other]

Adaptive Susceptibility and Heterogeneity in Contagion Models on Networks

Authors: Renato Pagliara, Naomi E. Leonard

Abstract: Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either re… ▽ More Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either return to its original susceptible state or acquire full immunity to reinfection. We study the network SIRI (Susceptible-Infected-Recovered-Infected) model, an epidemic model for the spread of contagious processes on a network of heterogeneous agents that can adapt their susceptibility to reinfection. The model generalizes existing models to accommodate realistic conditions in which agents acquire partial or compromised immunity after first exposure to an infection. We prove necessary and sufficient conditions on model parameters and network structure that distinguish four dynamic regimes: infection-free, epidemic, endemic, and bistable. For the bistable regime, which is not accounted for in traditional models, we show how there can be a rapid resurgent epidemic after what looks like convergence to an infection-free population. We use the model and its predictive capability to show how control strategies can be designed to mitigate problematic contagious behaviors. △ Less

Submitted 11 April, 2020; v1 submitted 20 July, 2019; originally announced July 2019.

Comments: 14 pages, 5 figures

arXiv:1905.08731 [pdf, other]

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1808.07842 [pdf, other]

doi 10.1007/978-3-319-03904-6_2

In the Dance Studio: An Art and Engineering Exploration of Human Flocking

Authors: Naomi E. Leonard, George F. Young, Kelsey Hochgraf, Daniel T. Swain, Aaron Trippe, Willa Chen, Katherine Fitch, Susan Marshall

Abstract: Flock Logic was developed as an art and engineering project to explore how the feedback laws used to model flocking translate when applied by dancers. The artistic goal was to create choreographic tools that leverage multi-agent system dynamics with designed feedback and interaction. The engineering goal was to provide insights and design principles for multi-agent systems, such as human crowds, a… ▽ More Flock Logic was developed as an art and engineering project to explore how the feedback laws used to model flocking translate when applied by dancers. The artistic goal was to create choreographic tools that leverage multi-agent system dynamics with designed feedback and interaction. The engineering goal was to provide insights and design principles for multi-agent systems, such as human crowds, animal groups and robotic networks, by examining what individual dancers do and what emerges at the group level. We describe our methods to create dance and investigate collective motion. We analyze video of an experiment in which dancers moved according to simple rules of cohesion and repulsion with their neighbors. Using the prescribed interaction protocol and tracked trajectories, we estimate the time-varying graph that defines who is responding to whom. We compute status of nodes in the graph and show the emergence of leaders. We discuss results and further directions. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Journal ref: Leonard N.E. et al. (2014) In the Dance Studio: An Art and Engineering Exploration of Human Flocking. In: LaViers A., Egerstedt M. (eds) Controls and Art. Springer, Cham

arXiv:1710.00450 [pdf, other]

Asymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems

Authors: T. W. U. Madhushani, D. H. S. Maithripala, N. E. Leonard

Abstract: This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering many… ▽ More This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering many new problems of practical importance. For instance it affords the simultaneous consideration of temporal option unavailabilities and the depen- dencies between options with time varying option characteristics in a seamless manner. We show that, for this class of problems, the combination of any Upper Confidence Bound type algorithm with any efficient reward estimator for the expected reward ensures the logarithmic bounding of the expected cumulative regret. We demonstrate the versatility of the approach by the explicit consideration of a new example of practical interest. △ Less

Submitted 7 October, 2017; v1 submitted 1 October, 2017; originally announced October 2017.

Comments: Pre-print submitted to 2018 American Control Conference

MSC Class: 60-01

arXiv:1606.00911 [pdf, other]

Distributed Cooperative Decision-Making in Multiarmed Bandits: Frequentist and Bayesian Algorithms

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for eac… ▽ More We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for each agent's estimation of mean rewards from its own rewards and the estimated rewards of its neighbors. We prove the performance of these algorithms and show that they asymptotically recover the performance of a centralized agent. Further, we rigorously characterize the influence of the communication graph structure on the decision-making performance of the group. △ Less

Submitted 17 September, 2019; v1 submitted 2 June, 2016; originally announced June 2016.

Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 IEEE Conference on Decision and Control (CDC). The second statement of Proposition 1 and Theorem 1 are new from arXiv:1512.06888v3 and Lemma 1 is new. These are used to prove regret bounds in Theorems 2 and 3

arXiv:1512.07638 [pdf, other]

Satisficing in multi-armed bandit problems

Authors: Paul Reverdy, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximi… ▽ More Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance. △ Less

Submitted 19 December, 2016; v1 submitted 23 December, 2015; originally announced December 2015.

Comments: To appear in IEEE Transactions on Automatic Control

arXiv:1512.06888 [pdf, other]

On Distributed Cooperative Decision-Making in Multiarmed Bandits

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection… ▽ More We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group. △ Less

Submitted 16 September, 2019; v1 submitted 21 December, 2015; originally announced December 2015.

Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 European Control Conference (ECC). The second statement of Proposition 1, Theorem 1 and their proofs are new. The new Theorem 1 is used to prove the regret bounds in Theorem 2

arXiv:1507.01160 [pdf, other]

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

Authors: Vaibhav Srivastava, Paul Reverdy, Naomi Ehrich Leonard

Abstract: We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence,… ▽ More We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance. △ Less

Submitted 7 July, 2015; v1 submitted 4 July, 2015; originally announced July 2015.

arXiv:1502.04635 [pdf, other]

Parameter estimation in softmax decision-making models with linear objective functions

Authors: Paul Reverdy, Naomi E. Leonard

Abstract: With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditio… ▽ More With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditions under which the likelihood function is convex. These allow us to provide sufficient conditions for convergence of the resulting maximum likelihood estimator and to construct its asymptotic distribution. In the case of models with nonlinear objective functions, we show how the estimator can be applied by linearizing about a nominal parameter value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit) model of human decision-making to human subject data. We show statistically significant differences in behavior across related, but distinct, tasks. △ Less

Submitted 29 August, 2015; v1 submitted 16 February, 2015; originally announced February 2015.

Comments: In press

MSC Class: 93E10

arXiv:1402.3634 [pdf, other]

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

Authors: Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free… ▽ More We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free response paradigm in which individuals take their time to make the decision. We develop reduced DDMs as decoupled approximations to the coupled DDM and characterize their efficiency. We determine high probability bounds on the error rate and the expected decision time for the reduced DDM. We show the effect of the decision-maker's location in the network on their decision-making performance under several threshold selection criteria. Finally, we extend the coupled DDM to the coupled Ornstein-Uhlenbeck model for decision-making in two alternative choice tasks with recency effects, and to the coupled race model for decision-making in multiple alternative choice tasks. △ Less

Submitted 14 February, 2014; originally announced February 2014.

Comments: to appear in IEEE TCNS

arXiv:1307.6134 [pdf, other]

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

Authors: Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard

Abstract: We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewar… ▽ More We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sub-logarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples. NB: Appendix G included in this version details minor modifications that correct for an oversight in the previously-published proofs. The remainder of the text reflects the published work. △ Less

Submitted 20 December, 2019; v1 submitted 23 July, 2013; originally announced July 2013.

Comments: 25 pages. Appendix G included in this version details minor modifications that correct for an oversight in the previously-published proofs. The remainder of the text reflects the previously-published version

Journal ref: Proceedings of the IEEE, vol. 102, iss. 4, p. 544-571, 2014

arXiv:1303.2242 [pdf, other]

doi 10.1016/j.physd.2013.04.014

Adaptive Network Dynamics and Evolution of Leadership in Collective Migration

Authors: Darren Pais, Naomi Ehrich Leonard

Abstract: The evolution of leadership in migratory populations depends not only on costs and benefits of leadership investments but also on the opportunities for individuals to rely on cues from others through social interactions. We derive an analytically tractable adaptive dynamic network model of collective migration with fast timescale migration dynamics and slow timescale adaptive dynamics of individua… ▽ More The evolution of leadership in migratory populations depends not only on costs and benefits of leadership investments but also on the opportunities for individuals to rely on cues from others through social interactions. We derive an analytically tractable adaptive dynamic network model of collective migration with fast timescale migration dynamics and slow timescale adaptive dynamics of individual leadership investment and social interaction. For large populations, our analysis of bifurcations with respect to investment cost explains the observed hysteretic effect associated with recovery of migration in fragmented environments. Further, we show a minimum connectivity threshold above which there is evolutionary branching into leader and follower populations. For small populations, we show how the topology of the underlying social interaction network influences the emergence and location of leaders in the adaptive system. Our model and analysis can describe other adaptive network dynamics involving collective tracking or collective learning of a noisy, unknown signal, and likewise can inform the design of robotic networks where agents use decentralized strategies that balance direct environmental measurements with agent interactions. △ Less

Submitted 9 March, 2013; originally announced March 2013.

Comments: Submitted to Physica D: Nonlinear Phenomena

arXiv:1209.2194 [pdf, ps, other]

Cooperative learning in multi-agent systems from intermittent measurements

Authors: Naomi Ehrich Leonard, Alex Olshevsky

Abstract: Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distribute… ▽ More Motivated by the problem of tracking a direction in a decentralized way, we consider the general problem of cooperative learning in multi-agent systems with time-varying connectivity and intermittent measurements. We propose a distributed learning protocol capable of learning an unknown vector $μ$ from noisy measurements made independently by autonomous nodes. Our protocol is completely distributed and able to cope with the time-varying, unpredictable, and noisy nature of inter-agent communication, and intermittent noisy measurements of $μ$. Our main result bounds the learning speed of our protocol in terms of the size and combinatorial features of the (time-varying) networks connecting the nodes. △ Less

Submitted 15 December, 2014; v1 submitted 10 September, 2012; originally announced September 2012.

Showing 1–37 of 37 results for author: Leonard, N E