Search | arXiv e-print repository

Relational Reasoning On Graphs Using Opinion Dynamics

Authors: Yulong Yang, Bowen Feng, Keqin Wang, Naomi Leonard, Adji Bousso Dieng, Christine Allen-Blanchette

Abstract: From pedestrians to Kuramoto oscillators, interactions between agents govern how a multitude of dynamical systems evolve in space and time. Discovering how these agents relate to each other can improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These app… ▽ More From pedestrians to Kuramoto oscillators, interactions between agents govern how a multitude of dynamical systems evolve in space and time. Discovering how these agents relate to each other can improve our understanding of the often complex dynamics that underlie these systems. Recent works learn to categorize relationships between agents based on observations of their physical behavior. These approaches are limited in that the relationship categories are modelled as independent and mutually exclusive, when in real world systems categories are often interacting. In this work, we introduce a level of abstraction between the physical behavior of agents and the categories that define their behavior. To do this, we learn a map** from the agents' states to their affinities for each category in a graph neural network. We integrate the physical proximity of agents and their affinities in a nonlinear opinion dynamics model which provides a mechanism to identify mutually exclusive categories, predict an agent's evolution in time, and control an agent's behavior. We demonstrate the utility of our model for learning interpretable categories for mechanical systems, and demonstrate its efficacy on several long-horizon trajectory prediction benchmarks where we consistently out perform existing methods. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 14 pages, 7 figures

arXiv:2406.09810 [pdf, other]

Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

Authors: Haimin Hu, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is… ▽ More Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is yet to be fully addressed. The recently developed nonlinear opinion dynamics (NOD) show promise in enabling fast opinion formation and avoiding safety-critical deadlocks. However, it remains an open challenge to determine the model parameters of NOD automatically and adaptively, accounting for the ever-changing environment of interaction. In this work, we propose for the first time a learning-based, game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. The learned NOD can be used by existing dynamic game solvers to plan decisively while accounting for the predicted change of other agents' intents, thus enabling situational awareness in planning. We demonstrate Neural NOD's ability to make fast and robust decisions in a simulated autonomous racing example, leading to tangible improvements in safety and overtaking performance over state-of-the-art data-driven game-theoretic planning methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.20593 [pdf, other]

Excitable crawling

Authors: Juncal Arbelaiz, Alessio Franci, Naomi Ehrich Leonard, Rodolphe Sepulchre, Bassam Bamieh

Abstract: We propose and analyze the suitability of a spiking controller to engineer the locomotion of a soft robotic crawler. Inspired by the FitzHugh-Nagumo model of neural excitability, we design a bistable controller with an electrical flipflop circuit representation capable of generating spikes on-demand when coupled to the passive crawler mechanics. A proprioceptive sensory signal from the crawler mec… ▽ More We propose and analyze the suitability of a spiking controller to engineer the locomotion of a soft robotic crawler. Inspired by the FitzHugh-Nagumo model of neural excitability, we design a bistable controller with an electrical flipflop circuit representation capable of generating spikes on-demand when coupled to the passive crawler mechanics. A proprioceptive sensory signal from the crawler mechanics turns bistability of the controller into a rhythmic spiking. The output voltage, in turn, activates the crawler's actuators to generate movement through peristaltic waves. We show through geometric analysis that this control strategy achieves endogenous crawling. The electro-mechanical sensorimotor interconnection provides embodied negative feedback regulation, facilitating locomotion. Dimensional analysis provides insights on the characteristic scales in the crawler's mechanical and electrical dynamics, and how they determine the crawling gait. Adaptive control of the electrical scales to optimally match the mechanical scales can be envisioned to achieve further efficiency, as in homeostatic regulation of neuronal circuits. Our approach can scale up to multiple sensorimotor loops inspired by biological central pattern generators. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 5 pages, MTNS 2024 extended abstract

arXiv:2404.17756 [pdf, other]

Suppressed self-diffusion of nanoscale constituents of a complex liquid

Authors: Christian P. N. Tanner, Vivian R. K. Wall, Mumtaz Gababa, Joshua Portner, Ahhyun Jeong, Matthew J. Hurley, Nicholas Leonard, Jonathan G. Raybin, James K. Utterback, Ahyoung Kim, Andrei Fluerasu, Yanwen Sun, Johannes Moeller, Alexey Zozulya, Wonhyuk Jo, Anders Madsen, Dmitri V. Talapin, Samuel W. Teitelbaum, Naomi S. Ginsberg

Abstract: The ability to understand and ultimately control the transformations and properties of various nanoscale systems, from proteins to synthetic nanomaterial assemblies, hinges on the ability to directly elucidate their dynamics on their characteristic length and time scales. Here, we use MHz X-ray photon correlation spectroscopy (XPCS) to directly elucidate the characteristic microsecond-dynamics of… ▽ More The ability to understand and ultimately control the transformations and properties of various nanoscale systems, from proteins to synthetic nanomaterial assemblies, hinges on the ability to directly elucidate their dynamics on their characteristic length and time scales. Here, we use MHz X-ray photon correlation spectroscopy (XPCS) to directly elucidate the characteristic microsecond-dynamics of density fluctuations of semiconductor nanocrystals (NCs), not only in a colloidal dispersion but also in a liquid phase consisting of densely packed, yet mobile, NCs with no long-range order. By carefully disentangling X-ray induced effects, we find the wavevector-dependent fluctuation rates in the liquid phase are suppressed relative to those in the colloidal phase and to those in experiments and hydrodynamic theories of densely packed repulsive particles. We show that the suppressed rates are due to a substantial decrease in the self-diffusion of NCs in the liquid phase, which we attribute to explicit attractive interactions. Via comparison with simulations, we find that the extracted strength of the attractions explains the stability of the liquid phase, in contrast to the gelation observed via XPCS in many other charged colloidal systems. This work opens the door to elucidating fast, condensed phase dynamics in a variety of complex fluids and other nanoscale soft matter systems, such as densely packed proteins and non-equilibrium self-assembly processes. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 15 pages, 4 figures

arXiv:2404.16808 [pdf, other]

Enhancing nanocrystal superlattice self-assembly near a metastable liquid binodal

Authors: Christian P. N. Tanner, Vivian R. K. Wall, Joshua Portner, Ahhyun Jeong, Avishek Das, James K. Utterback, Leo M. Hamerlynck, Jonathan G. Raybin, Matthew J. Hurley, Nicholas Leonard, Rebecca B. Wai, Jenna A. Tan, Mumtaz Gababa, Chenhui Zhu, Eric Schaible, Christopher J. Tassone, David T. Limmer, Samuel W. Teitelbaum, Dmitri V. Talapin, Naomi S. Ginsberg

Abstract: Bottom-up assembly of nanocrystals (NCs) into ordered arrays, or superlattices (SLs), is a promising route to design materials with new functionalities, but the degree of control over assembly into functional structures remains challenging. Using electrostatics, rather than density, to tune the interactions between semiconductor NCs, we watch self-assembly proceeding through a metastable liquid ph… ▽ More Bottom-up assembly of nanocrystals (NCs) into ordered arrays, or superlattices (SLs), is a promising route to design materials with new functionalities, but the degree of control over assembly into functional structures remains challenging. Using electrostatics, rather than density, to tune the interactions between semiconductor NCs, we watch self-assembly proceeding through a metastable liquid phase. We systematically investigate the phase behavior as a function of quench conditions in situ and in real time using small angle X-ray scattering (SAXS). Through quantitative fitting to colloid, liquid, and SL models, we extract the time evolution of each phase and the system phase diagram, which we find to be consistent with short-range attractive interactions. Using the phase diagram's predictive power, we establish control of the self-assembly rate over three orders of magnitude, and identify one- and two-step self-assembly regimes, with only the latter implicating the metastable liquid as an intermediate. Importantly, the presence of the metastable liquid increases SL formation rates relative to the equivalent one-step pathway, and SL order counterintuitively increases with the rate, revealing a highly desirable and generalizable kinetic strategy to promote and enhance ordered assembly. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 16 pages, 4 figures

arXiv:2403.05457 [pdf, other]

Sparse dynamic network reconstruction through L1-regularization of a Lyapunov equation

Authors: Ian Xul Belaustegui, Marcela Ordorica Arango, Román Rossi-Pool, Naomi Ehrich Leonard, Alessio Franci

Abstract: An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct… ▽ More An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct directed networks but the reconstructed network lacks information about interaction strengths. We propose a network reconstruction method that inherits the best of the two approaches by reconstructing a directed weighted network from noisy data under the assumption that the network is sparse and the dynamics are governed by a linear (or weakly-nonlinear) stochastic dynamical system. The two steps of our method are i) constructing an (infinite) family of candidate networks by solving the covariance matrix Lyapunov equation for the state matrix and ii) using L1-regularization to select a sparse solution. We further show how to use prior information on the (non)existence of a few directed edges to drastically improve the quality of the reconstruction. △ Less

Submitted 12 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.14174 [pdf, other]

Blending Data-Driven Priors in Dynamic Games

Authors: Justin Lidard, Haimin Hu, Asher Hancock, Zixu Zhang, Albert Gimó Contreras, Vikash Modi, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, María Santos, Jaime Fernández Fisac

Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, h… ▽ More As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/. △ Less

Submitted 6 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, 12 figures

arXiv:2312.06395 [pdf, other]

Threshold Decision-Making Dynamics Adaptive to Physical Constraints and Changing Environment

Authors: Giovanna Amorim, María Santos, Shinkyu Park, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by me… ▽ More We propose a threshold decision-making framework for controlling the physical dynamics of an agent switching between two spatial tasks. Our framework couples a nonlinear opinion dynamics model that represents the evolution of an agent's preference for a particular task with the physical dynamics of the agent. We prove the bifurcation that governs the behavior of the coupled dynamics. We show by means of the bifurcation behavior how the coupled dynamics are adaptive to the physical constraints of the agent. We also show how the bifurcation can be modulated to allow the agent to switch tasks based on thresholds adaptive to environmental conditions. We illustrate the benefits of the approach through a decentralized multi-robot task allocation application for trash collection. △ Less

Submitted 7 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.02204 [pdf, other]

Active risk aversion in SIS epidemics on networks

Authors: Anastasia Bizyaeva, Marcela Ordorica Arango, Yunxiu Zhou, Simon Levin, Naomi Ehrich Leonard

Abstract: We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two dist… ▽ More We present and analyze an actively controlled Susceptible-Infected-Susceptible (actSIS) model of interconnected populations to study how risk aversion strategies, such as social distancing, affect network epidemics. A population using a risk aversion strategy reduces its contact rate with other populations when it perceives an increase in infection risk. The network actSIS model relies on two distinct networks. One is a physical contact network that defines which populations come into contact with which other populations and thus how infection spreads. The other is a communication network, such as an online social network, that defines which populations observe the infection level of which other populations and thus how information spreads. We prove that the model, with these two networks and populations using risk aversion strategies, exhibits a transcritical bifurcation in which an endemic equilibrium emerges. For regular graphs, we prove that the endemic infection level is uniform across populations and reduced by the risk aversion strategy, relative to the network SIS endemic level. We show that when communication is sufficiently sparse, this initially stable equilibrium loses stability in a secondary bifurcation. Simulations show that a new stable solution emerges with nonuniform infection levels. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2308.14666 [pdf, other]

Learning to Predict 3D Rotational Dynamics from Images of a Rigid Body with Unknown Mass Distribution

Authors: Justice Mason, Christine Allen-Blanchette, Nicholas Zolman, Elizabeth Davison, Naomi Ehrich Leonard

Abstract: In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribut… ▽ More In many real-world settings, image observations of freely rotating 3D rigid bodies may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics. The usefulness of standard deep learning methods is also limited, because an image of a rigid body reveals nothing about the distribution of mass inside the body, which, together with initial angular velocity, is what determines how the body will rotate. We present a physics-based neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion. We demonstrate the efficacy of our approach on new rotating rigid-body datasets of sequences of synthetic images of rotating objects, including cubes, prisms and satellites, with unknown uniform and non-uniform mass distributions. Our model outperforms competing baselines on our datasets, producing better qualitative predictions and reducing the error observed for the state-of-the-art Hamiltonian Generative Network by a factor of 2. △ Less

Submitted 10 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Previously appeared as arXiv:2209.11355v2, which was submitted as a replacement by accident. arXiv admin note: text overlap with arXiv:2209.11355

arXiv:2308.02755 [pdf, other]

Multi-topic belief formation through bifurcations over signed social networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that agents become opinionated as a consequence of a bifurcation. W… ▽ More We propose and analyze a nonlinear dynamic model of continuous-time multi-dimensional belief formation over signed social networks. Our model accounts for the effects of a structured belief system, self-appraisal, internal biases, and various sources of cognitive dissonance posited by recent theories in social psychology. We prove that agents become opinionated as a consequence of a bifurcation. We analyze how the balance of social network effects in the model controls the nature of the bifurcation and, therefore, the belief-forming limit-set solutions. Our analysis provides constructive conditions on how multi-stable network belief equilibria and belief oscillations emerging at a belief-forming bifurcation depend on the communication network graph and belief system network graph. Our model and analysis provide new theoretical insights on the dynamics of social systems and a new principled framework for designing decentralized decision-making on engineered networks in the presence of structured relationships among alternatives. △ Less

Submitted 2 July, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 16 pages, 7 figures

arXiv:2306.07535 [pdf, other]

Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization

Authors: Shinkyu Park, Naomi Ehrich Leonard

Abstract: We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibri… ▽ More We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibrium of the game. However, when games are subject to time delays, conventional decision-making models from the population game literature may result in oscillations in the strategy revision process or convergence to an equilibrium other than the Nash. To address this problem, we propose the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model, along with an algorithm that iteratively updates the model's regularization parameter across a network of communicating agents. Using passivity-based convergence analysis techniques, we show that the KLD-RL model achieves convergence to the Nash equilibrium without oscillations, even for a class of population games that are subject to time delays. We demonstrate our main results numerically on a two-population congestion game and a two-population zero-sum game. △ Less

Submitted 3 June, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.17600 [pdf, other]

NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction

Authors: Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

Abstract: Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-… ▽ More Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model. △ Less

Submitted 11 November, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures

arXiv:2304.02687 [pdf, other]

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to uns… ▽ More We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2210.01642 [pdf, other]

Proactive Opinion-Driven Robot Navigation around Human Movers

Authors: Charlotte Cathcart, María Santos, Shinkyu Park, Naomi Ehrich Leonard

Abstract: We propose, analyze, and experimentally verify a new proactive approach for robot social navigation driven by the robot's "opinion" for which way and by how much to pass human movers crossing its path. The robot forms an opinion over time according to nonlinear dynamics that depend on the robot's observations of human movers and its level of attention to these social cues. For these dynamics, it i… ▽ More We propose, analyze, and experimentally verify a new proactive approach for robot social navigation driven by the robot's "opinion" for which way and by how much to pass human movers crossing its path. The robot forms an opinion over time according to nonlinear dynamics that depend on the robot's observations of human movers and its level of attention to these social cues. For these dynamics, it is guaranteed that when the robot's attention is greater than a critical value, deadlock in decision making is broken, and the robot rapidly forms a strong opinion, passing each human mover even if the robot has no bias nor evidence for which way to pass. We enable proactive rapid and reliable social navigation by having the robot grow its attention across the critical value when a human mover approaches. With human-robot experiments we demonstrate the flexibility of our approach and validate our analytical results on deadlock-breaking. We also show that a single design parameter can tune the trade-off between efficiency and reliability in human-robot passing. The new approach has the additional advantage that it does not rely on a predictive model of human behavior. △ Less

Submitted 11 September, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 8 pages, 7 figures

arXiv:2210.00353 [pdf, other]

Sustained oscillations in multi-topic belief dynamics over signed networks

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude… ▽ More We study the dynamics of belief formation on multiple interconnected topics in networks of agents with a shared belief system. We establish sufficient conditions and necessary conditions under which sustained oscillations of beliefs arise on the network in a Hopf bifurcation and characterize the role of the communication graph and the belief system graph in sha** the relative phase and amplitude patterns of the oscillations. Additionally, we distinguish broad classes of graphs that exhibit such oscillations from those that do not. △ Less

Submitted 22 March, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: 6 pages, 6 figures, accepted for publication in the 2023 American Control Conference proceedings

arXiv:2209.11355 [pdf, other]

Learning Interpretable Dynamics from Images of a Freely Rotating 3D Rigid Body

Authors: Justice Mason, Christine Allen-Blanchette, Nicholas Zolman, Elizabeth Davison, Naomi Leonard

Abstract: In many real-world settings, image observations of freely rotating 3D rigid bodies, such as satellites, may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics and a lack of interpretability reduces the usefulness of standard deep learning methods. In this work, we present… ▽ More In many real-world settings, image observations of freely rotating 3D rigid bodies, such as satellites, may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics and a lack of interpretability reduces the usefulness of standard deep learning methods. In this work, we present a physics-informed neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion with a learned representation of the Hamiltonian. We demonstrate the efficacy of our approach on a new rotating rigid-body dataset with sequences of rotating cubes and rectangular prisms with uniform and non-uniform density. △ Less

Submitted 23 August, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 13 pages, 7 figures

arXiv:2208.01800 [pdf, other]

Decentralized Learning With Limited Communications for Multi-robot Coverage of Unknown Spatial Fields

Authors: Kensuke Nakamura, María Santos, Naomi Ehrich Leonard

Abstract: This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are… ▽ More This paper presents an algorithm for a team of mobile robots to simultaneously learn a spatial field over a domain and spatially distribute themselves to optimally cover it. Drawing from previous approaches that estimate the spatial field through a centralized Gaussian process, this work leverages the spatial structure of the coverage problem and presents a decentralized strategy where samples are aggregated locally by establishing communications through the boundaries of a Voronoi partition. We present an algorithm whereby each robot runs a local Gaussian process calculated from its own measurements and those provided by its Voronoi neighbors, which are incorporated into the individual robot's Gaussian process only if they provide sufficiently novel information. The performance of the algorithm is evaluated in simulation and compared with centralized approaches. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Accepted IROS 2022

arXiv:2206.14893 [pdf, other]

Breaking indecision in multi-agent, multi-option dynamics

Authors: Alessio Franci, Martin Golubitsky, Ian Stewart, Anastasia Bizyaeva, Naomi Ehrich Leonard

Abstract: How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is… ▽ More How does a group of agents break indecision when deciding about options with qualities that are hard to distinguish? Biological and artificial multi-agent systems, from honeybees and bird flocks to bacteria, robots, and humans, often need to overcome indecision when choosing among options in situations in which the performance or even the survival of the group are at stake. Breaking indecision is also important because in a fully indecisive state agents are not biased toward any specific option and therefore the agent group is maximally sensitive and prone to adapt to inputs and changes in its environment. Here, we develop a mathematical theory to study how decisions arise from the breaking of indecision. Our approach is grounded in both equivariant and network bifurcation theory. We model decision from indecision as synchrony-breaking in influence networks in which each node is the value assigned by an agent to an option. First, we show that three universal decision behaviors, namely, deadlock, consensus, and dissensus, are the generic outcomes of synchrony-breaking bifurcations from a fully synchronous state of indecision in influence networks. Second, we show that all deadlock and consensus value patterns and some dissensus value patterns are predicted by the symmetry of the influence networks. Third, we show that there are also many `exotic' dissensus value patterns. These patterns are predicted by network architecture, but not by network symmetries, through a new synchrony-breaking branching lemma. This is the first example of exotic solutions in an application. Numerical simulations of a novel influence network model illustrate our theoretical results. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 36 pages

arXiv:2203.11703 [pdf, other]

doi 10.1109/LCSYS.2022.3185981

Switching transformations for decentralized control of opinion patterns in signed networks: application to dynamic task allocation

Authors: Anastasia Bizyaeva, Giovanna Amorim, Maria Santos, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a new decentralized design method to control opinion patterns on signed networks of agents making decisions about two options and to switch the network from any opinion pattern to a new desired one. Our method relies on switching transformations, which switch the sign of an agent's opinion at a stable equilibrium by flip** the sign of its local interactions with its neighbors. The glo… ▽ More We propose a new decentralized design method to control opinion patterns on signed networks of agents making decisions about two options and to switch the network from any opinion pattern to a new desired one. Our method relies on switching transformations, which switch the sign of an agent's opinion at a stable equilibrium by flip** the sign of its local interactions with its neighbors. The global dynamical behavior of the switched network can be predicted rigorously when the original, and thus the witched, networks are structurally balanced. Structural balance ensures that the network dynamics are monotone, which makes the study of the basin of attraction of the various opinion patterns amenable to rigorous analysis through monotone systems theory. We illustrate the utility of the approach through scenarios motivated by multi-robot coordination and dynamic task allocation. △ Less

Submitted 31 May, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2201.13288 [pdf, other]

A Regret Minimization Approach to Multi-Agent Control

Authors: Udaya Ghai, Udari Madhushani, Naomi Leonard, Elad Hazan

Abstract: We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distri… ▽ More We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics. △ Less

Submitted 25 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:7422-7434, 2022

arXiv:2111.12482 [pdf, other]

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Authors: Udari Madhushani, Abhimanyu Dubey, Naomi Ehrich Leonard, Alex Pentland

Abstract: The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative ban… ▽ More The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Journal ref: Conference on Neural Information Processing Systems, 2021

arXiv:2110.07392 [pdf, other]

Provably Efficient Multi-Agent Reinforcement Learning with Fully Decentralized Communication

Authors: Justin Lidard, Udari Madhushani, Naomi Ehrich Leonard

Abstract: A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transiti… ▽ More A challenge in reinforcement learning (RL) is minimizing the cost of sampling associated with exploration. Distributed exploration reduces sampling complexity in multi-agent RL (MARL). We investigate the benefits to performance in MARL when exploration is fully decentralized. Specifically, we consider a class of online, episodic, tabular $Q$-learning problems under time-varying reward and transition dynamics, in which agents can communicate in a decentralized manner.We show that group performance, as measured by the bound on regret, can be significantly improved through communication when each agent uses a decentralized message-passing protocol, even when limited to sending information up to its $γ$-hop neighbors. We prove regret and sample complexity bounds that depend on the number of agents, communication network structure and $γ.$ We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy. Numerical simulations illustrate our results and validate our theoretical claims. △ Less

Submitted 2 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper to American Control Conference (ACC) 2022

arXiv:2110.05599 [pdf, other]

doi 10.1103/PhysRevD.105.055019

Flavored Gauge-Mediated Supersymmetry Breaking Models with Discrete Non-Abelian Symmetries

Authors: Shu Tian Eu, Lisa L. Everett, Todd Garon, Neil Leonard

Abstract: We investigate flavored gauge mediation models in which the Higgs and messenger doublets are embedded in multiplets of the discrete non-Abelian symmetry S3. In these theories, the S3 symmetry correlates the flavor structure of the quark and lepton Yukawa couplings with the structure of the messenger Yukawa couplings that contribute to the soft supersymmetry breaking mass parameters. We provide a s… ▽ More We investigate flavored gauge mediation models in which the Higgs and messenger doublets are embedded in multiplets of the discrete non-Abelian symmetry S3. In these theories, the S3 symmetry correlates the flavor structure of the quark and lepton Yukawa couplings with the structure of the messenger Yukawa couplings that contribute to the soft supersymmetry breaking mass parameters. We provide a systematic exploration of possible scenarios within this framework that can accommodate hierarchical quark and charged lepton masses, and examine the resulting phenomenological implications in each case. We find a heavier spectrum for the superpartner masses compared to flavored gauge mediation models controlled by Abelian symmetries, which can be directly traced back to the need in our scenarios for two vectorlike pairs of messenger fields for viable electroweak symmetry breaking. △ Less

Submitted 29 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 32 pages, 8 figures

Journal ref: Phys. Rev. D 105, 055019, 2022

arXiv:2110.04396 [pdf, other]

When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits

Authors: Udari Madhushani, Naomi Leonard

Abstract: In cooperative bandits, a framework that captures essential features of collective sequential decision making, agents can minimize group regret, and thereby improve performance, by leveraging shared information. However, sharing information can be costly, which motivates develo** policies that minimize group regret while also reducing the number of messages communicated by agents. Existing coope… ▽ More In cooperative bandits, a framework that captures essential features of collective sequential decision making, agents can minimize group regret, and thereby improve performance, by leveraging shared information. However, sharing information can be costly, which motivates develo** policies that minimize group regret while also reducing the number of messages communicated by agents. Existing cooperative bandit algorithms obtain optimal performance when agents share information with their neighbors at \textit{every time step}, i.e., full communication. This requires $Θ(T)$ number of messages, where $T$ is the time horizon of the decision making process. We propose \textit{ComEx}, a novel cost-effective communication protocol in which the group achieves the same order of performance as full communication while communicating only $O(\log T)$ number of messages. Our key step is develo** a method to identify and only communicate the information crucial to achieving optimal performance. Further we propose novel algorithms for several benchmark cooperative bandit frameworks and show that our algorithms obtain \textit{state-of-the-art} performance while consistently incurring a significantly smaller communication cost than existing algorithms. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2108.00966 [pdf, other]

Tuning Cooperative Behavior in Games with Nonlinear Opinion Dynamics

Authors: Shinkyu Park, Anastasia Bizyaeva, Mari Kawakatsu, Alessio Franci, Naomi Ehrich Leonard

Abstract: We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate… ▽ More We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate behavior of agents that select strategies using rationality and reciprocity, key features of human decision-making in social dilemmas. For two-strategy games, we use bifurcation analysis to prove conditions for the bistability of two equilibria and conditions for the first (second) equilibrium to reflect all agents favoring the first (second) strategy. We prove how model parameters, e.g., level of attention to opinions of others (reciprocity), network structure, and payoffs, influence dynamics and, notably, the size of the region of attraction to each stable equilibrium. We provide insights by examining the tuning of the bistability of mutual cooperation and mutual defection and their regions of attraction for the repeated prisoner's dilemma and the repeated multi-agent public goods game. Our results generalize to games with more strategies, heterogeneity, and additional feedback dynamics, such as those designed to elicit cooperation. △ Less

Submitted 23 November, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

arXiv:2103.14764 [pdf, ps, other]

doi 10.1109/CDC45484.2021.9683650

Control of Agreement and Disagreement Cascades with Distributed Inputs

Authors: Anastasia Bizyaeva, Timothy Sorochkin, Alessio Franci, Naomi Ehrich Leonard

Abstract: For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using… ▽ More For a group of autonomous communicating agents, the ability to distinguish a meaningful input from disturbance, and come to collective agreement or disagreement in response to that input, is paramount for carrying out coordinated objectives. In this work we study how a cascade of opinion formation spreads through a group of networked decision-makers in response to a distributed input signal. Using a nonlinear opinion dynamics model with dynamic feedback modulation of an attention parameter, we show how the triggering of an opinion cascade and the collective decision itself depend on both the distributed input and the node agreement and disagreement centrality, determined by the spectral properties of the network graph. We further show how the attention dynamics introduce an implicit threshold that distinguishes between distributed inputs that trigger cascades and ones that are rejected as disturbance. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: 7 pages, 4 figures

arXiv:2103.12223 [pdf, other]

doi 10.1007/s11721-021-00190-w

Analysis and control of agreement and disagreement opinion cascades

Authors: Alessio Franci, Anastasia Bizyaeva, Shinkyu Park, Naomi Ehrich Leonard

Abstract: We introduce and analyze a continuous time and state-space model of opinion cascades on networks of large numbers of agents that form opinions about two or more options. By leveraging our recent results on the emergence of agreement and disagreement states, we introduce novel tools to analyze and control agreement and disagreement opinion cascades. New notions of agreement and disagreement central… ▽ More We introduce and analyze a continuous time and state-space model of opinion cascades on networks of large numbers of agents that form opinions about two or more options. By leveraging our recent results on the emergence of agreement and disagreement states, we introduce novel tools to analyze and control agreement and disagreement opinion cascades. New notions of agreement and disagreement centrality, which depend only on network structure, are shown to be key to characterizing the nonlinear behavior of agreement and disagreement opinion formation and cascades. Our results are relevant for the analysis and control of opinion cascades in real-world networks, including biological, social and artificial networks, and for the design of opinion-forming behaviors in robotic swarms. We illustrate an application of our model to a multi-robot task-allocation problem and discuss extensions and future directions opened by our modeling framework. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:2011.07720 [pdf, other]

Distributed Bandits: Probabilistic Communication on $d$-regular Graphs

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. Afte… ▽ More We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations. △ Less

Submitted 8 October, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

arXiv:2011.05927 [pdf, other]

On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension

Authors: Udari Madhushani, Biswadip Dey, Naomi Ehrich Leonard, Amit Chakraborty

Abstract: Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant cha… ▽ More Value function based reinforcement learning (RL) algorithms, for example, $Q$-learning, learn optimal policies from datasets of actions, rewards, and state transitions. However, when the underlying state transition dynamics are stochastic and evolve on a high-dimensional space, generating independent and identically distributed (IID) data samples for creating these datasets poses a significant challenge due to the intractability of the associated normalizing integral. In these scenarios, Hamiltonian Monte Carlo (HMC) sampling offers a computationally tractable way to generate data for training RL algorithms. In this paper, we introduce a framework, called \textit{Hamiltonian $Q$-Learning}, that demonstrates, both theoretically and empirically, that $Q$ values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Furthermore, to exploit the underlying low-rank structure of the $Q$ function, Hamiltonian $Q$-Learning uses a matrix completion algorithm for reconstructing the updated $Q$ function from $Q$ value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply $Q$-learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications. △ Less

Submitted 28 March, 2022; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2010.12932 [pdf, other]

LagNetViP: A Lagrangian Neural Network for Video Prediction

Authors: Christine Allen-Blanchette, Sushant Veer, Anirudha Majumdar, Naomi Ehrich Leonard

Abstract: The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. The equations of motion, as defined by Newton's second law, describe the time evolution of a physical system state and can therefore be applied toward the determination of future system states. In this paper, we… ▽ More The dominant paradigms for video prediction rely on opaque transition models where neither the equations of motion nor the underlying physical quantities of the system are easily inferred. The equations of motion, as defined by Newton's second law, describe the time evolution of a physical system state and can therefore be applied toward the determination of future system states. In this paper, we introduce a video prediction model where the equations of motion are explicitly constructed from learned representations of the underlying physical quantities. To achieve this, we simultaneously learn a low-dimensional state representation and system Lagrangian. The kinetic and potential energy terms of the Lagrangian are distinctly modelled and the low-dimensional equations of motion are explicitly constructed using the Euler-Lagrange equations. We demonstrate the efficacy of this approach for video prediction on image sequences rendered in modified OpenAI gym Pendulum-v0 and Acrobot environments. △ Less

Submitted 24 October, 2020; originally announced October 2020.

arXiv:2009.13600 [pdf, other]

doi 10.23919/ACC50511.2021.9482811

Patterns of Nonlinear Opinion Formation on Networks

Authors: Anastasia Bizyaeva, Ayanna Matthews, Alessio Franci, Naomi Ehrich Leonard

Abstract: When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement… ▽ More When communicating agents form opinions about a set of possible options, agreement and disagreement are both possible outcomes. Depending on the context, either can be desirable or undesirable. We show that for nonlinear opinion dynamics on networks, and a variety of network structures, the spectral properties of the underlying adjacency matrix fully characterize the occurrence of either agreement or disagreement. We further show how the corresponding eigenvector centrality, as well as any symmetry in the network, informs the resulting patterns of opinion formation and agent sensitivity to input that triggers opinion cascades. △ Less

Submitted 26 March, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 6 pages, 4 figures; accepted to appear in 2021 American Control Conference proceedings

arXiv:2009.04332 [pdf, other]

doi 10.1109/TAC.2022.3159527

Nonlinear Opinion Dynamics with Tunable Sensitivity

Authors: Anastasia Bizyaeva, Alessio Franci, Naomi Ehrich Leonard

Abstract: We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network… ▽ More We propose a continuous-time multi-option nonlinear generalization of classical linear weighted-average opinion dynamics. Nonlinearity is introduced by saturating opinion exchanges, and this is enough to enable a significantly greater range of opinion-forming behaviors with our model as compared to existing linear and nonlinear models. For a group of agents that communicate opinions over a network, these behaviors include multistable agreement and disagreement, tunable sensitivity to input, robustness to disturbance, flexible transition between patterns of opinions, and opinion cascades. We derive network-dependent tuning rules to robustly control the system behavior and we design state-feedback dynamics for the model parameters to make the behavior adaptive to changing external conditions.} The model provides new means for systematic study of dynamics on natural and engineered networks, from information spread and political polarization to collective decision making and dynamic task allocation. △ Less

Submitted 30 July, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2009.01339 [pdf, other]

Heterogeneous Explore-Exploit Strategies on Multi-Star Networks

Authors: Udari Madhushani, Naomi Leonard

Abstract: We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problem… ▽ More We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e. when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds. △ Less

Submitted 1 December, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

arXiv:2008.04383 [pdf, other]

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

Authors: Yaofeng Desmond Zhong, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish sign… ▽ More The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish signals from different layers: an agent becomes active if a sufficient number of its neighbors in each of any $a$ of the $m$ layers is active. We focus on Protocol OR, when $a=1$, and Protocol AND, when $a=m$, which model agents that are most and least readily activated, respectively. We develop theory and algorithms to compute the size of the spread at steady state for any set of initially active agents and to analyze the role of distinguished sensing modalities, network structure, and heterogeneity. We show how heterogeneity manages the tension in spreading dynamics between sensitivity to inputs and robustness to disturbances. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2007.01926 [pdf, other]

Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control

Authors: Yaofeng Desmond Zhong, Naomi Ehrich Leonard

Abstract: Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model tha… ▽ More Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, when coordinates are embedded in high-dimensional data such as images, these approaches either lose interpretability or can only be applied to one particular example. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers. △ Less

Submitted 31 August, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: This version corrects an error in Equation (3) of the 2020 NeurIPS Proceedings paper

arXiv:2007.01424 [pdf, ps, other]

Active Control and Sustained Oscillations in actSIS Epidemic Dynamics

Authors: Yunxiu Zhou, Simon A. Levin, Naomi E. Leonard

Abstract: An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively… ▽ More An actively controlled Susceptible-Infected-Susceptible (actSIS) contagion model is presented for studying epidemic dynamics with continuous-time feedback control of infection rates. Our work is inspired by the observation that epidemics can be controlled through decentralized disease-control strategies such as quarantining, sheltering in place, social distancing, etc., where individuals actively modify their contact rates with others in response to observations of infection levels in the population. Accounting for a time lag in observations and categorizing individuals into distinct sub-populations based on their risk profiles, we show that the actSIS model manifests qualitatively different features as compared with the SIS model. In a homogeneous population of risk-averters, the endemic equilibrium is always reduced, although the transient infection level can exhibit overshoot or undershoot. In a homogeneous population of risk-tolerating individuals, the system exhibits bistability, which can also lead to reduced infection. For a heterogeneous population comprised of risk-tolerators and risk-averters, we prove conditions on model parameters for the existence of a Hopf bifurcation and sustained oscillations in the infected population. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2004.06171 [pdf, other]

Distributed Learning: Sequential Decision Making in Resource-Constrained Environments

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full commu… ▽ More We study cost-effective communication strategies that can be used to improve the performance of distributed learning systems in resource-constrained environments. For distributed learning in sequential decision making, we propose a new cost-effective partial communication protocol. We illustrate that with this protocol the group obtains the same order of performance that it obtains with full communication. Moreover, we prove that under the proposed partial communication protocol the communication cost is $O(\log T)$, where $T$ is the time horizon of the decision-making process. This improves significantly on protocols with full communication, which incur a communication cost that is $O(T)$. We validate our theoretical results using numerical simulations. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:2004.03793 [pdf, other]

A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it rece… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations. △ Less

Submitted 7 April, 2020; originally announced April 2020.

arXiv:2003.01312 [pdf, other]

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independen… ▽ More We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph. △ Less

Submitted 11 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1909.11852 [pdf, other]

doi 10.1109/CDC40024.2019.9029844

A Continuous Threshold Model of Cascade Dynamics

Authors: Yaofeng Desmond Zhong, Naomi Ehrich Leonard

Abstract: We present a continuous threshold model (CTM) of cascade dynamics for a network of agents with real-valued activity levels that change continuously in time. The model generalizes the linear threshold model (LTM) from the literature, where an agent becomes active (adopts an innovation) if the fraction of its neighbors that are active is above a threshold. With the CTM we study the influence on casc… ▽ More We present a continuous threshold model (CTM) of cascade dynamics for a network of agents with real-valued activity levels that change continuously in time. The model generalizes the linear threshold model (LTM) from the literature, where an agent becomes active (adopts an innovation) if the fraction of its neighbors that are active is above a threshold. With the CTM we study the influence on cascades of heterogeneity in thresholds for a network comprised of a chain of three clusters of agents, each distinguished by a different threshold. The system is most sensitive to change as the dynamics pass through a bifurcation point: if the bifurcation is supercritical the response will be contained, while if the bifurcation is subcritical the response will be a cascade. We show that there is a subcritical bifurcation, thus a cascade, in response to an innovation if there is a large enough disparity between the thresholds of sufficiently large clusters on either end of the chain; otherwise the response will be contained. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.05765 [pdf, other]

A model-independent theory of consensus and dissensus decision making

Authors: Alessio Franci, Martin Golubitsky, Anastasia Bizyaeva, Naomi Ehrich Leonard

Abstract: We develop a model-independent framework to study the dynamics of decision-making in opinion networks for an arbitrary number of agents and an arbitrary number of options. Model-independence means that the analysis is not performed on a specific set of equations, in contrast to classical approaches to decision making that fix a specific model and analyze it. Rather, the general features of decisio… ▽ More We develop a model-independent framework to study the dynamics of decision-making in opinion networks for an arbitrary number of agents and an arbitrary number of options. Model-independence means that the analysis is not performed on a specific set of equations, in contrast to classical approaches to decision making that fix a specific model and analyze it. Rather, the general features of decision making in dynamical opinion networks can be derived starting from empirically testable hypotheses about the deciding agents, the available options, and the interactions among them. After translating these empirical hypotheses into algebraic ones, we use the tools of equivariant bifurcation theory to uncover model-independent properties of dynamical opinion networks. The model-independent results are illustrated on a novel analytical model that is constructed by plugging a generic sigmoidal nonlinearity, modeling boundedness of opinions and opinion perception, into the model-independent equivariant structure. Our analysis reveals richer and more flexible opinion-formation behavior as compared to model-dependent approaches. For instance, analysis reveals the possibility of switching between consensus and various forms of dissensus by modulation of the level of agent cooperativity and without requiring any particular ad-hoc interaction topology (e.g., structural balance). From a theoretical viewpoint, we prove new results in equivariant bifurcation theory. We construct an exhaustive list of axial subgroups for the action of $\ES_n \times \ES_3$ on $\R^{n-1}\otimes\R^{2}$. We also generalize this list to the action of $\ES_n \times \ES_k$ on $\R^{n-1}\otimes \R^{k-1}$, i.e., for $n$ agents and $k$ options, although without proving that in this case the list is exhaustive. △ Less

Submitted 8 September, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

arXiv:1907.08829 [pdf, other]

Adaptive Susceptibility and Heterogeneity in Contagion Models on Networks

Authors: Renato Pagliara, Naomi E. Leonard

Abstract: Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either re… ▽ More Contagious processes, such as spread of infectious diseases, social behaviors, or computer viruses, affect biological, social, and technological systems. Epidemic models for large populations and finite populations on networks have been used to understand and control both transient and steady-state behaviors. Typically it is assumed that after recovery from an infection, every agent will either return to its original susceptible state or acquire full immunity to reinfection. We study the network SIRI (Susceptible-Infected-Recovered-Infected) model, an epidemic model for the spread of contagious processes on a network of heterogeneous agents that can adapt their susceptibility to reinfection. The model generalizes existing models to accommodate realistic conditions in which agents acquire partial or compromised immunity after first exposure to an infection. We prove necessary and sufficient conditions on model parameters and network structure that distinguish four dynamic regimes: infection-free, epidemic, endemic, and bistable. For the bistable regime, which is not accounted for in traditional models, we show how there can be a rapid resurgent epidemic after what looks like convergence to an infection-free population. We use the model and its predictive capability to show how control strategies can be designed to mitigate problematic contagious behaviors. △ Less

Submitted 11 April, 2020; v1 submitted 20 July, 2019; originally announced July 2019.

Comments: 14 pages, 5 figures

arXiv:1905.08731 [pdf, other]

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

Authors: Udari Madhushani, Naomi Ehrich Leonard

Abstract: We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an… ▽ More We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1812.07117 [pdf, other]

doi 10.1080/03080188.2018.1544806

Social decision-making driven by artistic explore-exploit tension

Authors: Kayhan Ozcimder, Biswadip Dey, Alessio Franci, Rebecca Lazier, Daniel Trueman, Naomi Ehrich Leonard

Abstract: We studied social decision-making in the rule-based improvisational dance $There$ $Might$ $Be$ $Others$, where dancers make in-the-moment compositional choices. Rehearsals provided a natural test-bed with communication restricted to non-verbal cues. We observed a key artistic explore-exploit tension in which the dancers switched between exploitation of existing artistic opportunities and riskier e… ▽ More We studied social decision-making in the rule-based improvisational dance $There$ $Might$ $Be$ $Others$, where dancers make in-the-moment compositional choices. Rehearsals provided a natural test-bed with communication restricted to non-verbal cues. We observed a key artistic explore-exploit tension in which the dancers switched between exploitation of existing artistic opportunities and riskier exploration of new ones. We investigated how the rules influenced the dynamics using rehearsals together with a model generalized from evolutionary dynamics. We tuned the rules to heighten the tension and modeled nonlinear fitness and feedback dynamics for mutation rate to capture the observed temporal phasing of the dancers' exploration-versus-exploitation. Using bifurcation analysis, we identified key controls of the tension and showed how they could shape the decision-making dynamics of the model much like turning a "dial" in the instructions to the dancers could shape the dance. The investigation became an integral part of the development of the dance. △ Less

Submitted 17 December, 2018; originally announced December 2018.

Journal ref: K. Ozcimder, B. Dey, A. Franci, R. Lazier, D. Trueman, and N. E. Leonard (2018): Social decision-making driven by artistic explore-exploit tension, Interdisciplinary Science Reviews

arXiv:1808.07842 [pdf, other]

doi 10.1007/978-3-319-03904-6_2

In the Dance Studio: An Art and Engineering Exploration of Human Flocking

Authors: Naomi E. Leonard, George F. Young, Kelsey Hochgraf, Daniel T. Swain, Aaron Trippe, Willa Chen, Katherine Fitch, Susan Marshall

Abstract: Flock Logic was developed as an art and engineering project to explore how the feedback laws used to model flocking translate when applied by dancers. The artistic goal was to create choreographic tools that leverage multi-agent system dynamics with designed feedback and interaction. The engineering goal was to provide insights and design principles for multi-agent systems, such as human crowds, a… ▽ More Flock Logic was developed as an art and engineering project to explore how the feedback laws used to model flocking translate when applied by dancers. The artistic goal was to create choreographic tools that leverage multi-agent system dynamics with designed feedback and interaction. The engineering goal was to provide insights and design principles for multi-agent systems, such as human crowds, animal groups and robotic networks, by examining what individual dancers do and what emerges at the group level. We describe our methods to create dance and investigate collective motion. We analyze video of an experiment in which dancers moved according to simple rules of cohesion and repulsion with their neighbors. Using the prescribed interaction protocol and tracked trajectories, we estimate the time-varying graph that defines who is responding to whom. We compute status of nodes in the graph and show the emergence of leaders. We discuss results and further directions. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Journal ref: Leonard N.E. et al. (2014) In the Dance Studio: An Art and Engineering Exploration of Human Flocking. In: LaViers A., Egerstedt M. (eds) Controls and Art. Springer, Cham

arXiv:1807.10824 [pdf, other]

doi 10.1063/1.5050178

Mixed mode oscillations and phase locking in coupled FitzHugh-Nagumo model neurons

Authors: Elizabeth N. Davison, Zahra Aminzare, Biswadip Dey, Naomi Ehrich Leonard

Abstract: We study the dynamics of a low-dimensional system of coupled model neurons as a step towards understanding the vastly complex network of neurons in the brain. We analyze the bifurcation structure of a system of two model neurons with unidirectional coupling as a function of two physiologically relevant parameters: the external current input only to the first neuron and the strength of the coupling… ▽ More We study the dynamics of a low-dimensional system of coupled model neurons as a step towards understanding the vastly complex network of neurons in the brain. We analyze the bifurcation structure of a system of two model neurons with unidirectional coupling as a function of two physiologically relevant parameters: the external current input only to the first neuron and the strength of the coupling from the first to the second neuron. Leveraging a timescale separation, we prove necessary conditions for multiple timescale phenomena observed in the coupled system, including canard solutions and mixed mode oscillations. For a larger network of model neurons, we present a sufficient condition for phase locking when external inputs are heterogeneous. Finally, we generalize our results to directed trees of model neurons with heterogeneous inputs. △ Less

Submitted 27 July, 2018; originally announced July 2018.

MSC Class: 34A26; 34C15; 34C60; 34D15; 34E17; 37G05; 37G10; 92B20; 92C20

arXiv:1805.03293 [pdf]

doi 10.1103/PhysRevB.100.144103

Intrinsic interfacial van der Waals monolayers and their effect on the high-temperature superconductor FeSe/SrTiO$_3$

Authors: Hunter Sims, Donovan N. Leonard, Axiel Yaël Birenbaum, Zhuozhi Ge, Tom Berlijn, Lian Li, Valentino R. Cooper, Matthew F. Chisholm, Sokrates T. Pantelides

Abstract: The sensitive dependence of monolayer materials on their environment often gives rise to unexpected properties. It was recently demonstrated that monolayer FeSe on a SrTiO$_3$ substrate exhibits a much higher superconducting critical temperature T$_C$ than the bulk material. Here, we examine the interfacial structure of FeSe / SrTiO$_3$ and the effect of an interfacial Ti$_{1+x}$O$_2$ layer on the… ▽ More The sensitive dependence of monolayer materials on their environment often gives rise to unexpected properties. It was recently demonstrated that monolayer FeSe on a SrTiO$_3$ substrate exhibits a much higher superconducting critical temperature T$_C$ than the bulk material. Here, we examine the interfacial structure of FeSe / SrTiO$_3$ and the effect of an interfacial Ti$_{1+x}$O$_2$ layer on the increased T$_C$ using a combination of scanning transmission electron microscopy and density functional theory. We find Ti$_{1+x}$O$_2$ forms its own quasi-two-dimensional layer, bonding to both the substrate and the FeSe film by van der Waals interactions. The excess Ti in this layer electron-dopes the FeSe monolayer in agreement with experimental observations. Moreover, the interfacial layer introduces symmetry-breaking distortions in the FeSe film that favor a T$_C$ increase. These results suggest that this common substrate may be functionalized to modify the electronic structure of a variety of thin films and monolayers. △ Less

Submitted 8 May, 2018; originally announced May 2018.

Comments: 10 pages, 4 figures, supplemental materials available upon request

Journal ref: Phys. Rev. B 100, 144103 (2019)

arXiv:1711.11578 [pdf, other]

Multi-agent decision-making dynamics inspired by honeybees

Authors: Rebecca Gray, Alessio Franci, Vaibhav Srivastava, Naomi Ehrich Leonard

Abstract: When choosing between candidate nest sites, a honeybee swarm reliably chooses the most valuable site and even when faced with the choice between near-equal value sites, it makes highly efficient decisions. Value-sensitive decision-making is enabled by a distributed social effort among the honeybees, and it leads to decision-making dynamics of the swarm that are remarkably robust to perturbation an… ▽ More When choosing between candidate nest sites, a honeybee swarm reliably chooses the most valuable site and even when faced with the choice between near-equal value sites, it makes highly efficient decisions. Value-sensitive decision-making is enabled by a distributed social effort among the honeybees, and it leads to decision-making dynamics of the swarm that are remarkably robust to perturbation and adaptive to change. To explore and generalize these features to other networks, we design distributed multi-agent network dynamics that exhibit a pitchfork bifurcation, ubiquitous in biological models of decision-making. Using tools of nonlinear dynamics we show how the designed agent-based dynamics recover the high performing value-sensitive decision-making of the honeybees and rigorously connect investigation of mechanisms of animal group decision-making to systematic, bio-inspired control of multi-agent network systems. We further present a distributed adaptive bifurcation control law and prove how it enhances the network decision-making performance beyond that observed in swarms. △ Less

Submitted 22 January, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

arXiv:1710.00450 [pdf, other]

Asymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems

Authors: T. W. U. Madhushani, D. H. S. Maithripala, N. E. Leonard

Abstract: This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering many… ▽ More This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering many new problems of practical importance. For instance it affords the simultaneous consideration of temporal option unavailabilities and the depen- dencies between options with time varying option characteristics in a seamless manner. We show that, for this class of problems, the combination of any Upper Confidence Bound type algorithm with any efficient reward estimator for the expected reward ensures the logarithmic bounding of the expected cumulative regret. We demonstrate the versatility of the approach by the explicit consideration of a new example of practical interest. △ Less

Submitted 7 October, 2017; v1 submitted 1 October, 2017; originally announced October 2017.

Comments: Pre-print submitted to 2018 American Control Conference

MSC Class: 60-01

Showing 1–50 of 77 results for author: Léonard, N