-
Risk-averse Learning with Non-Stationary Distributions
Authors:
Siyi Wang,
Zifan Wang,
Xinlei Yi,
Michael M. Zavlanos,
Karl H. Johansson,
Sandra Hirche
Abstract:
Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost chan…
▽ More
Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Policy Stitching: Learning Transferable Robot Policies
Authors:
**cheng Jian,
Easop Lee,
Zachary Bell,
Michael M. Zavlanos,
Boyuan Chen
Abstract:
Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations…
▽ More
Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations and scale to realistic tasks due to complex architecture design or strong regularization that limits the capacity of the learned policy. We propose Policy Stitching, a novel framework that facilitates robot transfer learning for novel combinations of robots and tasks. Our key idea is to apply modular policy design and align the latent representations between the modular interfaces. Our method allows direct stitching of the robot and task modules trained separately to form a new policy for fast adaptation. Our simulated and real-world experiments on various 3D manipulation tasks demonstrate the superior zero-shot and few-shot transfer learning performances of our method. Our project website is at: http://generalroboticslab.com/PolicyStitching/ .
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Max-Plus Synchronization in Decentralized Trading Systems
Authors:
Hans Riess,
Michael Munger,
Michael M. Zavlanos
Abstract:
We introduce a decentralized mechanism for pricing and exchanging alternatives constrained by transaction costs. We characterize the time-invariant solutions of a heat equation involving a (weighted) Tarski Laplacian operator, defined for max-plus matrix-weighted graphs, as approximate equilibria of the trading system. We study algebraic properties of the solution sets as well as convergence behav…
▽ More
We introduce a decentralized mechanism for pricing and exchanging alternatives constrained by transaction costs. We characterize the time-invariant solutions of a heat equation involving a (weighted) Tarski Laplacian operator, defined for max-plus matrix-weighted graphs, as approximate equilibria of the trading system. We study algebraic properties of the solution sets as well as convergence behavior of the dynamical system. We apply these tools to the "economic problem" of allocating scarce resources among competing uses. Our theory suggests differences in competitive equilibrium, bargaining, or cost-benefit analysis, depending on the context, are largely due to differences in the way that transaction costs are incorporated into the decision-making process. We present numerical simulations of the synchronization algorithm (RRAggU), demonstrating our theoretical findings.
△ Less
Submitted 13 September, 2023; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe
Authors:
Panagiotis Vlantis,
Leila J. Bridgeman,
Michael M. Zavlanos
Abstract:
In this work, we consider the problem of learning a feed-forward neural network controller to safely steer an arbitrarily shaped planar robot in a compact and obstacle-occluded workspace. Unlike existing methods that depend strongly on the density of data points close to the boundary of the safe state space to train neural network controllers with closed-loop safety guarantees, here we propose an…
▽ More
In this work, we consider the problem of learning a feed-forward neural network controller to safely steer an arbitrarily shaped planar robot in a compact and obstacle-occluded workspace. Unlike existing methods that depend strongly on the density of data points close to the boundary of the safe state space to train neural network controllers with closed-loop safety guarantees, here we propose an alternative approach that lifts such strong assumptions on the data that are hard to satisfy in practice and instead allows for graceful safety violations, i.e., of a bounded magnitude that can be spatially controlled. To do so, we employ reachability analysis techniques to encapsulate safety constraints in the training process. Specifically, to obtain a computationally efficient over-approximation of the forward reachable set of the closed-loop system, we partition the robot's state space into cells and adaptively subdivide the cells that contain states which may escape the safe set under the trained control law. Then, using the overlap between each cell's forward reachable set and the set of infeasible robot configurations as a measure for safety violations, we introduce appropriate terms into the loss function that penalize this overlap in the training process. As a result, our method can learn a safe vector field for the closed-loop system and, at the same time, provide worst-case bounds on safety violation over the whole configuration space, defined by the overlap between the over-approximation of the forward reachable set of the closed-loop system and the set of unsafe states. Moreover, it can control the tradeoff between computational complexity and tightness of these bounds. Our proposed method is supported by both theoretical results and simulation studies.
△ Less
Submitted 12 December, 2022; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Formal Verification of Stochastic Systems with ReLU Neural Network Controllers
Authors:
Shiqi Sun,
Yan Zhang,
Xusheng Luo,
Panagiotis Vlantis,
Miroslav Pajic,
Michael M. Zavlanos
Abstract:
In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussi…
▽ More
In this work, we address the problem of formal safety verification for stochastic cyber-physical systems (CPS) equipped with ReLU neural network (NN) controllers. Our goal is to find the set of initial states from where, with a predetermined confidence, the system will not reach an unsafe configuration within a specified time horizon. Specifically, we consider discrete-time LTI systems with Gaussian noise, which we abstract by a suitable graph. Then, we formulate a Satisfiability Modulo Convex (SMC) problem to estimate upper bounds on the transition probabilities between nodes in the graph. Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes. Additionally, using the proposed SMC formula, we devise a heuristic method to refine the abstraction of the system in order to further improve the estimated safety bounds. Finally, we corroborate the efficacy of the proposed method with simulation results considering a robot navigation example and comparison against a state-of-the-art verification scheme.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
An Optimal Graph-Search Method for Secure State Estimation
Authors:
Xusheng Luo,
Miroslav Pajic,
Michael M. Zavlanos
Abstract:
The growing complexity of modern Cyber-Physical Systems (CPS) and the frequent communication between their components make them vulnerable to malicious attacks. As a result, secure state estimation is a critical requirement for the control of these systems. Many existing secure state estimation methods suffer from combinatorial complexity which grows with the number of states and sensors in the sy…
▽ More
The growing complexity of modern Cyber-Physical Systems (CPS) and the frequent communication between their components make them vulnerable to malicious attacks. As a result, secure state estimation is a critical requirement for the control of these systems. Many existing secure state estimation methods suffer from combinatorial complexity which grows with the number of states and sensors in the system. This complexity can be mitigated using optimization-based methods that relax the original state estimation problem, although at the cost of optimality as these methods often identify attack-free sensors as attacked. In this paper, we propose a new optimal graph-search algorithm to correctly identify malicious attacks and to securely estimate the states even in large-scale CPS modeled as linear time-invariant systems. The graph consists of layers, each one containing two nodes capturing a truth assignment of any given sensor, and directed edges connecting adjacent layers only. Then, our algorithm searches the layers of this graph incrementally, favoring directions at higher layers with more attack-free assignments, while actively managing a repository of nodes to be expanded at later iterations. The proposed search bias and the ability to revisit nodes in the repository and self-correct, allow our graph-search algorithm to reach the optimal assignment faster and tackle larger problems. We show that our algorithm is complete and optimal provided that process and measurement noises do not dominate the attack signal. Moreover, we provide numerical simulations that demonstrate the ability of our algorithm to correctly identify attacked sensors and securely reconstruct the state. Our simulations show that our method outperforms existing algorithms both in terms of optimality and execution time.
△ Less
Submitted 7 October, 2020; v1 submitted 25 March, 2019;
originally announced March 2019.
-
Simultaneous Intermittent Communication Control and Path Optimization in Networks of Mobile Robots
Authors:
Yiannis Kantaros,
Michael M. Zavlanos
Abstract:
In this paper, we propose an intermittent communication framework for mobile robot networks. Specifically, we consider robots that move along the edges of a connected mobility graph and communicate only when they meet at the nodes of that graph giving rise to a dynamic communication network. Our proposed distributed controllers ensure intermittent connectivity of the network and path optimization,…
▽ More
In this paper, we propose an intermittent communication framework for mobile robot networks. Specifically, we consider robots that move along the edges of a connected mobility graph and communicate only when they meet at the nodes of that graph giving rise to a dynamic communication network. Our proposed distributed controllers ensure intermittent connectivity of the network and path optimization, simultaneously. We show that the intermittent connectivity requirement can be encapsulated by a global Linear Temporal Logic (LTL) formula. Then we approximately decompose it into local LTL expressions which are then assigned to the robots. To avoid conflicting robot behaviors that can occur due to this approximate decomposition, we develop a distributed conflict resolution scheme that generates non-conflicting discrete motion plans for every robot, based on the assigned local LTL expressions, whose composition satisfies the global LTL formula. By appropriately introducing delays in the execution of the generated motion plans we also show that the proposed controllers can be executed asynchronously.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Spectral Control of Mobile Robot Networks
Authors:
Michael M. Zavlanos,
Victor M. Preciado,
Ali Jadbabaie
Abstract:
The eigenvalue spectrum of the adjacency matrix of a network is closely related to the behavior of many dynamical processes run over the network. In the field of robotics, this spectrum has important implications in many problems that require some form of distributed coordination within a team of robots. In this paper, we propose a continuous-time control scheme that modifies the structure of a po…
▽ More
The eigenvalue spectrum of the adjacency matrix of a network is closely related to the behavior of many dynamical processes run over the network. In the field of robotics, this spectrum has important implications in many problems that require some form of distributed coordination within a team of robots. In this paper, we propose a continuous-time control scheme that modifies the structure of a position-dependent network of mobile robots so that it achieves a desired set of adjacency eigenvalues. For this, we employ a novel abstraction of the eigenvalue spectrum by means of the adjacency matrix spectral moments. Since the eigenvalue spectrum is uniquely determined by its spectral moments, this abstraction provides a way to indirectly control the eigenvalues of the network. Our construction is based on artificial potentials that capture the distance of the network's spectral moments to their desired values. Minimization of these potentials is via a gradient descent closed-loop system that, under certain convexity assumptions, ensures convergence of the network topology to one with the desired set of moments and, therefore, eigenvalues. We illustrate our approach in nontrivial computer simulations.
△ Less
Submitted 30 September, 2010;
originally announced October 2010.