Search | arXiv e-print repository

Online Policy Optimization in Unknown Nonlinear Systems

Authors: Yiheng Lin, James A. Preiss, Fengze Xie, Emile Anand, Soon-Jo Chung, Yisong Yue, Adam Wierman

Abstract: We study online policy optimization in nonlinear time-varying dynamical systems where the true dynamical models are unknown to the controller. This problem is challenging because, unlike in linear systems, the controller cannot obtain globally accurate estimations of the ground-truth dynamics using local exploration. We propose a meta-framework that combines a general online policy optimization al… ▽ More We study online policy optimization in nonlinear time-varying dynamical systems where the true dynamical models are unknown to the controller. This problem is challenging because, unlike in linear systems, the controller cannot obtain globally accurate estimations of the ground-truth dynamics using local exploration. We propose a meta-framework that combines a general online policy optimization algorithm ($\texttt{ALG}$) with a general online estimator of the dynamical system's model parameters ($\texttt{EST}$). We show that if the hypothetical joint dynamics induced by $\texttt{ALG}$ with known parameters satisfies several desired properties, the joint dynamics under inexact parameters from $\texttt{EST}$ will be robust to errors. Importantly, the final policy regret only depends on $\texttt{EST}$'s predictions on the visited trajectory, which relaxes a bottleneck on identifying the true parameters globally. To demonstrate our framework, we develop a computationally efficient variant of Gradient-based Adaptive Policy Selection, called Memoryless GAPS (M-GAPS), and use it to instantiate $\texttt{ALG}$. Combining M-GAPS with online gradient descent to instantiate $\texttt{EST}$ yields (to our knowledge) the first local regret bound for online policy optimization in nonlinear time-varying systems with unknown dynamics. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2403.18956 [pdf, other]

Characterizing Controllability and Observability for Systems with Locality, Communication, and Actuation Constraints

Authors: Lauren Conger, Yiheng Lin, Adam Wierman, Eric Mazumdar

Abstract: This paper presents a closed-form notion of controllability and observability for systems with communication delays, actuation delays, and locality constraints. The formulation reduces to classical notions of controllability and observability in the unconstrained setting. As a consequence of our formulation, we show that the addition of locality and communication constraints may not affect the con… ▽ More This paper presents a closed-form notion of controllability and observability for systems with communication delays, actuation delays, and locality constraints. The formulation reduces to classical notions of controllability and observability in the unconstrained setting. As a consequence of our formulation, we show that the addition of locality and communication constraints may not affect the controllability and observability of the system, and we provide an efficient sufficient condition under which this phenomenon occurs. This contrasts with actuation and sensing delays, which cause a gradual loss of controllability and observability as the delays increase. We illustrate our results using linearized swing equations for the power grid, showing how actuation delay and locality constraints affect controllability. △ Less

Submitted 4 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2311.00181 [pdf, other]

Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization

Authors: Neelkamal Bhuyan, Debankur Mukherjee, Adam Wierman

Abstract: We study the smoothed online quadratic optimization (SOQO) problem where, at each round $t$, a player plays an action $x_t$ in response to a quadratic hitting cost and an additional squared $\ell_2$-norm cost for switching actions. This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management, where sw… ▽ More We study the smoothed online quadratic optimization (SOQO) problem where, at each round $t$, a player plays an action $x_t$ in response to a quadratic hitting cost and an additional squared $\ell_2$-norm cost for switching actions. This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management, where switching-efficient algorithms are highly sought after. We study the SOQO problem in both adversarial and stochastic settings, and in this process, perform the first stochastic analysis of this class of problems. We provide the online optimal algorithm when the minimizers of the hitting cost function evolve as a general stochastic process, which, for the case of martingale process, takes the form of a distribution-agnostic dynamic interpolation algorithm (LAI). Next, we present the stochastic-adversarial trade-off by proving an $Ω(T)$ expected regret for the adversarial optimal algorithm in the literature (ROBD) with respect to LAI and, a sub-optimal competitive ratio for LAI in the adversarial setting. Finally, we present a best-of-both-worlds algorithm that obtains a robust adversarial performance while simultaneously achieving a near-optimal stochastic performance. △ Less

Submitted 23 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: 48 pages, 9 figures

arXiv:2310.20098 [pdf, other]

Robust Learning for Smoothed Online Convex Optimization with Feedback Delay

Authors: Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

Abstract: We study a challenging form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step nonlinear switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specif… ▽ More We study a challenging form of Smoothed Online Convex Optimization, a.k.a. SOCO, including multi-step nonlinear switching costs and feedback delay. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically,we prove that RCL is able to guarantee$(1+λ)$-competitiveness against any given expert for any$λ>0$, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly,RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay.We demonstrate the improvement of RCL in both robustness and average performance using battery management for electrifying transportationas a case study. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2309.14648 [pdf, other]

Learning the Uncertainty Sets for Control Dynamics via Set Membership: A Non-Asymptotic Analysis

Authors: Yingying Li, **g Yu, Lauren Conger, Taylan Kargin, Adam Wierman

Abstract: This paper studies uncertainty set estimation for unknown linear systems. Uncertainty sets are crucial for the quality of robust control since they directly influence the conservativeness of the control design. Departing from the confidence region analysis of least squares estimation, this paper focuses on set membership estimation (SME). Though good numerical performances have attracted applicati… ▽ More This paper studies uncertainty set estimation for unknown linear systems. Uncertainty sets are crucial for the quality of robust control since they directly influence the conservativeness of the control design. Departing from the confidence region analysis of least squares estimation, this paper focuses on set membership estimation (SME). Though good numerical performances have attracted applications of SME in the control literature, the non-asymptotic convergence rate of SME for linear systems remains an open question. This paper provides the first convergence rate bounds for SME and discusses variations of SME under relaxed assumptions. We also provide numerical results demonstrating SME's practical promise. △ Less

Submitted 9 June, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: ICML 2024

arXiv:2306.16674 [pdf, other]

Online learning for robust voltage control under uncertain grid topology

Authors: Christopher Yeh, **g Yu, Yuanyuan Shi, Adam Wierman

Abstract: Voltage control generally requires accurate information about the grid's topology in order to guarantee network stability. However, accurate topology identification is challenging for existing methods, especially as the grid is subject to increasingly frequent reconfiguration due to the adoption of renewable energy. In this work, we combine a nested convex body chasing algorithm with a robust pred… ▽ More Voltage control generally requires accurate information about the grid's topology in order to guarantee network stability. However, accurate topology identification is challenging for existing methods, especially as the grid is subject to increasingly frequent reconfiguration due to the adoption of renewable energy. In this work, we combine a nested convex body chasing algorithm with a robust predictive controller to achieve provably finite-time convergence to safe voltage limits in the online setting where there is uncertainty in both the network topology as well as load and generation variations. In an online fashion, our algorithm narrows down the set of possible grid models that are consistent with observations and adjusts reactive power generation accordingly to keep voltages within desired safety limits. Our approach can also incorporate existing partial knowledge of the network to improve voltage control performance. We demonstrate the effectiveness of our approach in a case study on a Southern California Edison 56-bus distribution system. Our experiments show that in practical settings, the controller is indeed able to narrow the set of consistent topologies quickly enough to make control decisions that ensure stability in both linearized and realistic non-linear models of the distribution grid. △ Less

Submitted 28 March, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Smart Grid. arXiv admin note: substantial text overlap with arXiv:2206.14369

arXiv:2306.10158 [pdf, other]

Learning-Augmented Decentralized Online Convex Optimization in Networks

Authors: Pengfei Li, Jianyi Yang, Adam Wierman, Shaolei Ren

Abstract: This paper studies decentralized online convex optimization in a networked multi-agent system and proposes a novel algorithm, Learning-Augmented Decentralized Online optimization (LADO), for individual agents to select actions only based on local online information. LADO leverages a baseline policy to safeguard online actions for worst-case robustness guarantees, while staying close to the machine… ▽ More This paper studies decentralized online convex optimization in a networked multi-agent system and proposes a novel algorithm, Learning-Augmented Decentralized Online optimization (LADO), for individual agents to select actions only based on local online information. LADO leverages a baseline policy to safeguard online actions for worst-case robustness guarantees, while staying close to the machine learning (ML) policy for average performance improvement. In stark contrast with the existing learning-augmented online algorithms that focus on centralized settings, LADO achieves strong robustness guarantees in a decentralized setting. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. △ Less

Submitted 23 September, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2304.02878 [pdf, other]

Online Stabilization of Unknown Linear Time-Varying Systems

Authors: **g Yu, Varun Gupta, Adam Wierman

Abstract: This paper studies the problem of online stabilization of an unknown discrete-time linear time-varying (LTV) system under bounded non-stochastic (potentially adversarial) disturbances. We propose a novel control algorithm based on convex body chasing (CBC). Under the assumption of infrequently changing or slowly drifting dynamics, the algorithm guarantees bounded-input-bounded-output stability in… ▽ More This paper studies the problem of online stabilization of an unknown discrete-time linear time-varying (LTV) system under bounded non-stochastic (potentially adversarial) disturbances. We propose a novel control algorithm based on convex body chasing (CBC). Under the assumption of infrequently changing or slowly drifting dynamics, the algorithm guarantees bounded-input-bounded-output stability in the closed loop. Our approach avoids system identification and applies, with minimal disturbance assumptions, to a variety of LTV systems of practical importance. We demonstrate the algorithm numerically on examples of LTV systems including Markov linear jump systems with finitely many jumps. △ Less

Submitted 14 December, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2301.08445 [pdf, other]

Online switching control with stability and regret guarantees

Authors: Yingying Li, James A. Preiss, Na Li, Yiheng Lin, Adam Wierman, Jeff Shamma

Abstract: This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stabi… ▽ More This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stability throughout the duration of its execution. We also provide a sublinear policy regret guarantee compared with the optimal stabilizing candidate controller. Lastly, we numerically test our algorithm on quadrotor planar flights and compare it with a classical switching control algorithm, falsification-based switching, and a classical multi-armed bandit algorithm, Exp3 with batches. △ Less

Submitted 23 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2211.17116 [pdf, other]

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Authors: Yizhou Zhang, Guannan Qu, Pan Xu, Yiheng Lin, Zaiwei Chen, Adam Wierman

Abstract: We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy us… ▽ More We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its $κ$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $κ$. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing $κ$. Numerical simulations demonstrate the effectiveness of LPI. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2210.12320 [pdf, other]

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Authors: Yiheng Lin, James A. Preiss, Emile Anand, Yingying Li, Yisong Yue, Adam Wierman

Abstract: We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent a… ▽ More We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks. △ Less

Submitted 12 June, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.12312 [pdf, ps, other]

Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity

Authors: Yiheng Lin, Yang Hu, Guannan Qu, Tongxin Li, Adam Wierman

Abstract: We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem… ▽ More We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem of perturbation analysis, enabling the derivation of regret bounds of MPC under a variety of settings. To demonstrate the power of our pipeline, we use it to generalize existing regret bounds on MPC in linear time-varying (LTV) systems to incorporate prediction errors on costs, dynamics, and disturbances. Further, our pipeline leads to regret bounds on MPC in systems with nonlinear dynamics and constraints. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2207.05950 [pdf, other]

Decentralized Online Convex Optimization in Networked Systems

Authors: Yiheng Lin, Judy Gan, Guannan Qu, Yash Kanoria, Adam Wierman

Abstract: We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each t… ▽ More We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each time, an agent has access to predictions of local cost functions for the next $k$ time steps in an $r$-hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio of $1 + \tilde{O}(ρ_T^k) + \tilde{O}(ρ_S^r)$ in an adversarial setting, where $ρ_T$ and $ρ_S$ are constants in $(0, 1)$ that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on $k$ and $r$ in our results is near optimal by lower bounding the competitive ratio of any decentralized online algorithm. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2206.14369 [pdf, other]

doi 10.1145/3538637.3538853

Robust Online Voltage Control with an Unknown Grid Topology

Authors: Christopher Yeh, **g Yu, Yuanyuan Shi, Adam Wierman

Abstract: Voltage control generally requires accurate information about the grid's topology in order to guarantee network stability. However, accurate topology identification is a challenging problem for existing methods, especially as the grid is subject to increasingly frequent reconfiguration due to the adoption of renewable energy. Further, running existing control mechanisms with incorrect network info… ▽ More Voltage control generally requires accurate information about the grid's topology in order to guarantee network stability. However, accurate topology identification is a challenging problem for existing methods, especially as the grid is subject to increasingly frequent reconfiguration due to the adoption of renewable energy. Further, running existing control mechanisms with incorrect network information may lead to unstable control. In this work, we combine a nested convex body chasing algorithm with a robust predictive controller to achieve provably finite-time convergence to safe voltage limits in the online setting where the network topology is initially unknown. Specifically, the online controller does not know the true network topology and line parameters, but instead must learn them over time by narrowing down the set of network topologies and line parameters that are consistent with its observations and adjusting reactive power generation accordingly to keep voltages within desired safety limits. We demonstrate the effectiveness of the approach using a case study, which shows that in practical settings the controller is indeed able to narrow the set of consistent topologies quickly enough to make control decisions that ensure stability. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Code: https://github.com/chrisyeh96/voltctrl

Journal ref: Proceedings of the Thirteenth ACM International Conference on Future Energy Systems (e-Energy '22), June 2022

arXiv:2206.11780 [pdf, other]

Chasing Convex Bodies and Functions with Black-Box Advice

Authors: Nicolas Christianson, Tinashe Handina, Adam Wierman

Abstract: We consider the problem of convex function chasing with black-box advice, where an online decision-maker aims to minimize the total cost of making and switching between decisions in a normed vector space, aided by black-box advice such as the decisions of a machine-learned algorithm. The decision-maker seeks cost comparable to the advice when it performs well, known as $\textit{consistency}$, whil… ▽ More We consider the problem of convex function chasing with black-box advice, where an online decision-maker aims to minimize the total cost of making and switching between decisions in a normed vector space, aided by black-box advice such as the decisions of a machine-learned algorithm. The decision-maker seeks cost comparable to the advice when it performs well, known as $\textit{consistency}$, while also ensuring worst-case $\textit{robustness}$ even when the advice is adversarial. We first consider the common paradigm of algorithms that switch between the decisions of the advice and a competitive algorithm, showing that no algorithm in this class can improve upon 3-consistency while staying robust. We then propose two novel algorithms that bypass this limitation by exploiting the problem's convexity. The first, INTERP, achieves $(\sqrt{2}+ε)$-consistency and $\mathcal{O}(\frac{C}{ε^2})$-robustness for any $ε> 0$, where $C$ is the competitive ratio of an algorithm for convex function chasing or a subclass thereof. The second, BDINTERP, achieves $(1+ε)$-consistency and $\mathcal{O}(\frac{CD}ε)$-robustness when the problem has bounded diameter $D$. Further, we show that BDINTERP achieves near-optimal consistency-robustness trade-off for the special case where cost functions are $α$-polyhedral. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted to COLT 2022

arXiv:2206.01704 [pdf, ps, other]

KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Dynamical Systems

Authors: Sahin Lale, Yuanyuan Shi, Guannan Qu, Kamyar Azizzadenesheli, Adam Wierman, Anima Anandkumar

Abstract: Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lya… ▽ More Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lyapunov functions as a stability constraint. The proposed method learns the system dynamics up to a confidence interval using feature representation, e.g. Random Fourier Features. It then solves a constrained policy optimization problem with a stability constraint based on Krasovskii's method using a primal-dual approach to recover a stabilizing policy. We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system. We also derive the sample complexity upper bound for stabilization of unknown nonlinear dynamical systems via the KCRL framework. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2204.05551 [pdf, other]

Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems

Authors: Sungho Shin, Yiheng Lin, Guannan Qu, Adam Wierman, Mihai Anitescu

Abstract: This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. Th… ▽ More This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $κ$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $κ$-distributed control and centralized optimal control becomes exponentially small in $κ$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems. △ Less

Submitted 11 September, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

arXiv:2203.04503 [pdf, other]

An Energy Sharing Mechanism Considering Network Constraints and Market Power Limitation

Authors: Yue Chen, Changhong Zhao, Steven H. Low, Adam Wierman

Abstract: As the number of prosumers with distributed energy resources (DERs) grows, the conventional centralized operation scheme may suffer from conflicting interests, privacy concerns, and incentive inadequacy. In this paper, we propose an energy sharing mechanism to address the above challenges. It takes into account network constraints and fairness among prosumers. In the proposed energy sharing market… ▽ More As the number of prosumers with distributed energy resources (DERs) grows, the conventional centralized operation scheme may suffer from conflicting interests, privacy concerns, and incentive inadequacy. In this paper, we propose an energy sharing mechanism to address the above challenges. It takes into account network constraints and fairness among prosumers. In the proposed energy sharing market, all prosumers play a generalized Nash game. The market equilibrium is proved to have nice features in a large market or when it is a variational equilibrium. To deal with the possible market failure, inefficiency, or instability in general cases, we introduce a price regulation policy to avoid market power exploitation. The improved energy sharing mechanism with price regulation can guarantee existence and uniqueness of a socially near-optimal market equilibrium. Some advantageous properties are proved, such as prosumer's individual rationality, a sharing price structure similar to the locational marginal price, and the tendency towards social optimum with an increasing number of prosumers. For implementation, a practical bidding algorithm is developed with convergence condition. Experimental results validate the theoretical outcomes and show the practicability of our model and method. △ Less

Submitted 27 June, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 23 pages, 14 figures

arXiv:2203.02630 [pdf, other]

Online Adversarial Stabilization of Unknown Networked Systems

Authors: **g Yu, Dimitar Ho, Adam Wierman

Abstract: We investigate the problem of stabilizing an unknown networked linear system under communication constraints and adversarial disturbances. We propose the first provably stabilizing algorithm for the problem. The algorithm uses a distributed version of nested convex body chasing to maintain a consistent estimate of the network dynamics and applies system level synthesis to determine a distributed c… ▽ More We investigate the problem of stabilizing an unknown networked linear system under communication constraints and adversarial disturbances. We propose the first provably stabilizing algorithm for the problem. The algorithm uses a distributed version of nested convex body chasing to maintain a consistent estimate of the network dynamics and applies system level synthesis to determine a distributed controller based on this estimated model. Our approach avoids the need for system identification and accommodates a broad class of communication delay while being fully distributed and scaling favorably with the number of subsystems. △ Less

Submitted 22 January, 2023; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Proc. ACM Meas. Anal. Comput. Syst. 7, 1, Article 26 (March 2023), 43 pages. https://doi.org/10.1145/ 3579452

arXiv:2202.07187 [pdf, other]

On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory

Authors: Yang Hu, Adam Wierman, Guannan Qu

Abstract: Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We pr… ▽ More Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We propose a novel algorithm based on spectral decomposition that only needs to learn "a small part" of the dynamical matrix acting on its unstable subspace. We show that, under proper assumptions, our algorithm stabilizes an LTI system on a single trajectory with $\tilde{O}(k)$ samples, where $k$ is the instability index of the system. This represents the first sub-linear sample complexity result for the stabilization of LTI systems under the regime when $k = o(n)$. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: 40 pages, 2 figures, submitted to COLT 2022

arXiv:2202.07086 [pdf, other]

Price Cycles in Ridesharing Platforms

Authors: Chenkai Yu, Hongyao Ma, Adam Wierman

Abstract: In ridesharing platforms such as Uber and Lyft, it is observed that drivers sometimes collaboratively go offline when the price is low, and then return after the price has risen due to the perceived lack of supply. This collective strategy leads to cyclic fluctuations in prices and available drivers, resulting in poor reliability and social welfare. We study a continuous time, non-atomic model and… ▽ More In ridesharing platforms such as Uber and Lyft, it is observed that drivers sometimes collaboratively go offline when the price is low, and then return after the price has risen due to the perceived lack of supply. This collective strategy leads to cyclic fluctuations in prices and available drivers, resulting in poor reliability and social welfare. We study a continuous time, non-atomic model and prove that such online/offline strategies may form a Nash equilibrium among drivers, but lead to a lower total driver payoff if the market is sufficiently dense. Further, we show how to set price floors that effectively mitigate the emergence and impact of price cycles. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2111.00095 [pdf, other]

Online Optimization with Feedback Delay and Nonlinear Switching Cost

Authors: Weici Pan, Guanya Shi, Yiheng Lin, Adam Wierman

Abstract: We study a variant of online optimization in which the learner receives $k$-round $\textit{delayed feedback}$ about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio… ▽ More We study a variant of online optimization in which the learner receives $k$-round $\textit{delayed feedback}$ about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is $O(L^{2k})$, where $L$ is the Lipschitz constant of the switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on $k$ and $L$ are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constant-competitive online policies. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2109.14854 [pdf, other]

Stability Constrained Reinforcement Learning for Real-Time Voltage Control

Authors: Yuanyuan Shi, Guannan Qu, Steven Low, Anima Anandkumar, Adam Wierman

Abstract: Deep reinforcement learning (RL) has been recognized as a promising tool to address the challenges in real-time control of power systems. However, its deployment in real-world power systems has been hindered by a lack of formal stability and safety guarantees. In this paper, we propose a stability constrained reinforcement learning method for real-time voltage control in distribution grids and we… ▽ More Deep reinforcement learning (RL) has been recognized as a promising tool to address the challenges in real-time control of power systems. However, its deployment in real-world power systems has been hindered by a lack of formal stability and safety guarantees. In this paper, we propose a stability constrained reinforcement learning method for real-time voltage control in distribution grids and we prove that the proposed approach provides a formal voltage stability guarantee. The key idea underlying our approach is an explicitly constructed Lyapunov function that certifies stability. We demonstrate the effectiveness of the approach in case studies, where the proposed method can reduce the transient control cost by more than 30\% and shorten the response time by a third compared to a widely used linear policy, while always achieving voltage stability. In contrast, standard RL methods often fail to achieve voltage stability. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2106.10497 [pdf, ps, other]

Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems

Authors: Yiheng Lin, Yang Hu, Haoyuan Sun, Guanya Shi, Guannan Qu, Adam Wierman

Abstract: We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for the future $k$ time steps. We show that when the prediction window $k$ is sufficiently large, predictive control is input-to-state stable and achieves a… ▽ More We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for the future $k$ time steps. We show that when the prediction window $k$ is sufficiently large, predictive control is input-to-state stable and achieves a dynamic regret of $O(λ^k T)$, where $λ< 1$ is a positive constant. This is the first dynamic regret bound on the predictive control of linear time-varying systems. Under more assumptions on the terminal costs, we also show that predictive control obtains the first competitive bound for the control of linear time-varying systems: $1 + O(λ^k)$. Our results are derived using a novel proof framework based on a perturbation bound that characterizes how a small change to the system parameters impacts the optimal trajectory. △ Less

Submitted 19 June, 2021; originally announced June 2021.

arXiv:2106.09659 [pdf, other]

doi 10.1145/3508038

Robustness and Consistency in Linear Quadratic Control with Untrusted Predictions

Authors: Tongxin Li, Ruixiao Yang, Guannan Qu, Guanya Shi, Chenkai Yu, Adam Wierman, Steven H. Low

Abstract: We study the problem of learning-augmented predictive linear quadratic control. Our goal is to design a controller that balances \textit{"consistency"}, which measures the competitive ratio when predictions are accurate, and \textit{"robustness"}, which bounds the competitive ratio when predictions are inaccurate. We propose a novel $λ$-confident policy and provide a competitive ratio upper bound… ▽ More We study the problem of learning-augmented predictive linear quadratic control. Our goal is to design a controller that balances \textit{"consistency"}, which measures the competitive ratio when predictions are accurate, and \textit{"robustness"}, which bounds the competitive ratio when predictions are inaccurate. We propose a novel $λ$-confident policy and provide a competitive ratio upper bound that depends on a trust parameter $λ\in [0,1]$ set based on the confidence in the predictions and some prediction error $\varepsilon$. Motivated by online learning methods, we design a self-tuning policy that adaptively learns the trust parameter $λ$ with a competitive ratio that depends on $\varepsilon$ and the variation of system perturbations and predictions. We show that its competitive ratio is bounded from above by $ 1+{O(\varepsilon)}/({Θ(1)+Θ(\varepsilon)})+O(μ_{\mathsf{Var}})$ where $μ_\mathsf{Var}$ measures the variation of perturbations and predictions. It implies that when the variations of perturbations and predictions are small, by automatically adjusting the trust parameter online, the self-tuning scheme ensures a competitive ratio that does not scale up with the prediction error $\varepsilon$. △ Less

Submitted 5 July, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 34 pages, 8 figures, ACM SIGMETRICS 2022

arXiv:2105.05234 [pdf, other]

A Spectral Representation of Power Systems with Applications to Adaptive Grid Partitioning and Cascading Failure Localization

Authors: Alessandro Zocca, Chen Liang, Linqi Guo, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing betw… ▽ More Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing between non-cut and cut set outages. Using this spectral representation of power systems, we identify the crucial graph sub-structure that ensures line failure localization -- the network bridge-block decomposition. Leveraging this theory, we propose an adaptive network topology reconfiguration paradigm that uses a two-stage algorithm where the first stage aims to identify optimal clusters using the notion of network modularity and the second stage refines the clusters by means of optimal line switching actions. Our proposed methodology is illustrated using extensive numerical examples on standard IEEE networks and we discussed several extensions and variants of the proposed algorithm. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: 45 pages, 7 figures

arXiv:2104.14134 [pdf, other]

Stable Online Control of Linear Time-Varying Systems

Authors: Guannan Qu, Yuanyuan Shi, Sahin Lale, Anima Anandkumar, Adam Wierman

Abstract: Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the central problems in control theory. However, existing approaches that guarantee stability typically lead to significantly sub-optimal cumulative control cost in online settings where only current or short-te… ▽ More Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the central problems in control theory. However, existing approaches that guarantee stability typically lead to significantly sub-optimal cumulative control cost in online settings where only current or short-term system information is available. In this work, we propose an efficient online control algorithm, COvariance Constrained Online Linear Quadratic (COCO-LQ) control, that guarantees input-to-state stability for a large class of LTV systems while also minimizing the control cost. The proposed method incorporates a state covariance constraint into the semi-definite programming (SDP) formulation of the LQ optimal controller. We empirically demonstrate the performance of COCO-LQ in both synthetic experiments and a power system frequency control example. △ Less

Submitted 29 April, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: 3rd Annual Learning for Dynamics & Control Conference (L4DC)

arXiv:2012.11261 [pdf, other]

doi 10.1109/TSG.2021.3094719

Learning-Based Predictive Control via Real-Time Aggregate Flexibility

Authors: Tongxin Li, Bo Sun, Yue Chen, Zixin Ye, Steven H. Low, Adam Wierman

Abstract: Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. To be used effectively, an aggregator must be able to communicate the available flexibility of the loads they control, as known as the aggregate flexibility to a system operator. However, most of existing aggregate flexibility measures often are slow-timescale estimations and much less attention has… ▽ More Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. To be used effectively, an aggregator must be able to communicate the available flexibility of the loads they control, as known as the aggregate flexibility to a system operator. However, most of existing aggregate flexibility measures often are slow-timescale estimations and much less attention has been paid to real-time coordination between an aggregator and an operator. In this paper, we consider solving an online optimization in a closed-loop system and present a design of real-time aggregate flexibility feedback, termed the maximum entropy feedback (MEF). In addition to deriving analytic properties of the MEF, combining learning and control, we show that it can be approximated using reinforcement learning and used as a penalty term in a novel control algorithm -- the penalized predictive control (PPC), which modifies vanilla model predictive control (MPC). The benefits of our scheme are (1). Efficient Communication. An operator running PPC does not need to know the exact states and constraints of the loads, but only the MEF. (2). Fast Computation. The PPC often has much less number of variables than an MPC formulation. (3). Lower Costs. We show that under certain regularity assumptions, the PPC is optimal. We illustrate the efficacy of the PPC using a dataset from an adaptive electric vehicle charging network and show that PPC outperforms classical MPC. △ Less

Submitted 31 May, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: 13 pages, 5 figures, extension of arXiv:2006.13814

arXiv:2012.05361 [pdf, ps, other]

Data-driven Competitive Algorithms for Online Knapsack and Set Cover

Authors: Ali Zeynali, Bo Sun, Mohammad Hajiesmaili, Adam Wierman

Abstract: The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while al… ▽ More The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while also performing learning in order to perform well for typical inputs. Our approach is to identify policy classes that admit global worst-case guarantees, and then perform learning using historical data within the policy classes. We demonstrate the approach in the context of two classical problems, online knapsack and online set cover, proving competitive bounds for rich policy classes in each case. Additionally, we illustrate the practical implications via a case study on electric vehicle charging. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.11637 [pdf, other]

Competitive Control with Delayed Imperfect Information

Authors: Chenkai Yu, Guanya Shi, Soon-Jo Chung, Yisong Yue, Adam Wierman

Abstract: This paper studies the impact of imperfect information in online control with adversarial disturbances. In particular, we consider both delayed state feedback and inexact predictions of future disturbances. We introduce a greedy, myopic policy that yields a constant competitive ratio against the offline optimal policy. We also analyze the fundamental limits of online control with limited informati… ▽ More This paper studies the impact of imperfect information in online control with adversarial disturbances. In particular, we consider both delayed state feedback and inexact predictions of future disturbances. We introduce a greedy, myopic policy that yields a constant competitive ratio against the offline optimal policy. We also analyze the fundamental limits of online control with limited information by showing that our competitive ratio bounds for the greedy, myopic policy in the adversarial setting match (up to lower-order terms) lower bounds in the stochastic setting. △ Less

Submitted 25 March, 2022; v1 submitted 22 October, 2020; originally announced October 2020.

arXiv:2006.13814 [pdf, other]

Real-time Flexibility Feedback for Closed-loop Aggregator and System Operator Coordination

Authors: Tongxin Li, Steven H. Low, Adam Wierman

Abstract: Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. However, to be used effectively, aggregators must be able to communicate the available flexibility of the loads they control to the system operator in a manner that is both (i) concise enough to be scalable to aggregators governing hundreds or even thousands of loads and (ii) informative enough to al… ▽ More Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. However, to be used effectively, aggregators must be able to communicate the available flexibility of the loads they control to the system operator in a manner that is both (i) concise enough to be scalable to aggregators governing hundreds or even thousands of loads and (ii) informative enough to allow the system operator to send control signals to the aggregator that lead to optimization of system-level objectives, such as cost minimization, and do not violate private constraints of the loads, such as satisfying specific load demands. In this paper, we present the design of a real-time flexibility feedback signal based on maximization of entropy. The design provides a concise and informative signal that can be used by the system operator to perform online cost minimization and real-time capacity estimation, while provably satisfying the private constraints of the loads. In addition to deriving analytic properties of the design, we illustrate the effectiveness of the design using a dataset from an adaptive electric vehicle charging network. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: The Eleventh ACM International Conference on Future Energy Systems (e-Energy'20)

arXiv:2006.07569 [pdf, other]

The Power of Predictions in Online Control

Authors: Chenkai Yu, Guanya Shi, Soon-Jo Chung, Yisong Yue, Adam Wierman

Abstract: We study the impact of predictions in online Linear Quadratic Regulator control with both stochastic and adversarial disturbances in the dynamics. In both settings, we characterize the optimal policy and derive tight bounds on the minimum cost and dynamic regret. Perhaps surprisingly, our analysis shows that the conventional greedy MPC approach is a near-optimal policy in both stochastic and adver… ▽ More We study the impact of predictions in online Linear Quadratic Regulator control with both stochastic and adversarial disturbances in the dynamics. In both settings, we characterize the optimal policy and derive tight bounds on the minimum cost and dynamic regret. Perhaps surprisingly, our analysis shows that the conventional greedy MPC approach is a near-optimal policy in both stochastic and adversarial settings. Specifically, for length-$T$ problems, MPC requires only $O(\log T)$ predictions to reach $O(1)$ dynamic regret, which matches (up to lower-order terms) our lower bound on the required prediction horizon for constant regret. △ Less

Submitted 8 January, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2006.07476 [pdf, other]

Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach

Authors: Guannan Qu, Chenkai Yu, Steven Low, Adam Wierman

Abstract: Model-free learning-based control methods have seen great success recently. However, such methods typically suffer from poor sample complexity and limited convergence guarantees. This is in sharp contrast to classical model-based control, which has a rich theory but typically requires strong modeling assumptions. In this paper, we combine the two approaches to achieve the best of both worlds. We c… ▽ More Model-free learning-based control methods have seen great success recently. However, such methods typically suffer from poor sample complexity and limited convergence guarantees. This is in sharp contrast to classical model-based control, which has a rich theory but typically requires strong modeling assumptions. In this paper, we combine the two approaches to achieve the best of both worlds. We consider a dynamical system with both linear and non-linear components and develop a novel approach to use the linear model to define a warm start for a model-free, policy gradient method. We show this hybrid approach outperforms the model-based controller while avoiding the convergence issues associated with model-free approaches via both numerical experiments and theoretical analyses, in which we derive sufficient conditions on the non-linear component such that our approach is guaranteed to converge to the (nearly) global optimal controller. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.06626 [pdf, other]

Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Authors: Guannan Qu, Yiheng Lin, Adam Wierman, Na Li

Abstract: It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifi… ▽ More It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance. △ Less

Submitted 11 June, 2020; originally announced June 2020.

arXiv:2005.11320 [pdf, ps, other]

doi 10.1109/TPWRS.2021.3068048

Line Failure Localization of Power Networks Part II: Cut Set Outages

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of… ▽ More Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of cut set failures, are fully determined by the positions in the network of the buses that participate in load balancing. We then extend our results to general cut set outages. In contrast to non-cut outages discussed in Part I whose subsequent line failures are contained within the original blocks, cut set outages typically impact the whole network, affecting the power flows on all remaining lines. We corroborate our analytical results in both parts using the IEEE 118-bus test system, in which the failure propagation patterns exhibit a clear block-diagonal structure predicted by our theory, even when using full AC power flow equations. △ Less

Submitted 23 April, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: arXiv admin note: text overlap with arXiv:1803.08551

arXiv:2005.11319 [pdf, other]

Adaptive Network Response to Line Failures in Power Systems

Authors: Chen Liang, Linqi Guo, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the… ▽ More Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the unified control operates coincide with the bridge-blocks of the network, the proposed strategy drives the post-contingency system to a steady state where the impact of initial line outages is localized within the areas where they occurred whenever possible, stop** the cascading process. When the initial line outages cannot be localized, the proposed control strategy provides a configurable design that progressively involves and coordinates more balancing areas. We compare the proposed control strategy with the classical Automatic Generation Control (AGC) on the IEEE 118-bus and 2736-bus test networks. Simulation results show that our strategy greatly improves overall reliability in terms of the N-k security standard, and localizes the impact of initial failures in the majority of the simulated contingencies. Moreover, the proposed framework incurs significantly less load loss, if any, compared to AGC, in all our case studies. △ Less

Submitted 12 May, 2022; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: Accepted to IEEE Transactions on Control of Network Systems. arXiv admin note: text overlap with arXiv:1904.05461

arXiv:2005.10199 [pdf, ps, other]

doi 10.1109/TPWRS.2021.3066336

Line Failure Localization of Power Networks Part I: Non-cut Outages

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms… ▽ More Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms of topological structures. Our results show that the distribution of specific collections of subtrees of the transmission network plays a critical role on the patterns of power redistribution, and motivates the block decomposition of the transmission network as a structure to understand long-distance propagation of disturbances. In Part I of this paper, we present the case when the post-contingency network remains connected after an initial set of lines are disconnected simultaneously. In Part II, we present the case when an outage separates the network into multiple islands. △ Less

Submitted 23 April, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.12280 [pdf, other]

Minimal-Variance Distributed Deadline Scheduling

Authors: Yorie Nakahira, Andres Ferragut, Adam Wierman

Abstract: Many modern schedulers can dynamically adjust their service capacity to match the incoming workload. At the same time, however, unpredictability and instability in service capacity often incur operational and infrastructure costs. In this paper, we seek to characterize optimal distributed algorithms that maximize the predictability, stability, or both when scheduling jobs with deadlines. Specifica… ▽ More Many modern schedulers can dynamically adjust their service capacity to match the incoming workload. At the same time, however, unpredictability and instability in service capacity often incur operational and infrastructure costs. In this paper, we seek to characterize optimal distributed algorithms that maximize the predictability, stability, or both when scheduling jobs with deadlines. Specifically, we show that Exact Scheduling minimizes both the stationary mean and variance of the service capacity subject to strict demand and deadline requirements. For more general settings, we characterize the minimal-variance distributed policies with soft demand requirements, soft deadline requirements, or both. The performance of the optimal distributed policies is compared to that of the optimal centralized policy by deriving closed-form bounds and by testing centralized and distributed algorithms using real data from the Caltech electrical vehicle charging facility and many pieces of synthetic data from different arrival distribution. Moreover, we derive the Pareto-optimality condition for distributed policies that balance the variance and mean square of the service capacity. Finally, we discuss a scalable partially-centralized algorithm that uses centralized information to boost performance and a method to deal with missing information on service requirements. △ Less

Submitted 10 May, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

arXiv:2004.10401 [pdf, other]

An Integrated Approach for Failure Mitigation & Localization in Power Systems

Authors: Chen Liang, Linqi Guo, Alessandro Zocca, Shuyue Yu, Steven H. Low, Adam Wierman

Abstract: The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the contr… ▽ More The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the control areas are connected in a tree structure; and (ii) leverages a unified frequency control paradigm to provide congestion management in real time. Even though the proposed topology reduces redundancy, the integration of tree structure at regional level and real-time congestion management can provide stronger guarantees on failure localization and mitigation. We illustrate our approach on the IEEE 39-bus network and evaluate its performance on the IEEE 118-bus, 179-bus, 200-bus and 240-bus networks with various network congestion conditions. Simulations show that, compared with the traditional approach, our approach not only prevents load shedding in more failure scenarios, but also incurs smaller amounts of load loss in scenarios where load shedding is inevitable. Moreover, generators under our approach adjust their operations more actively and efficiently in a local manner. △ Less

Submitted 22 April, 2020; originally announced April 2020.

Comments: Accepted to the 21st Power Systems Computation Conference (PSCC 2020)

arXiv:2002.08908 [pdf, ps, other]

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

Authors: Xingyu Zhou, Ness Shroff, Adam Wierman

Abstract: We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated… ▽ More We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework directly resolves a recent open problem on how to design optimal load balancing schemes using delayed information. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: 2 figures

arXiv:2002.05318 [pdf, other]

Online Optimization with Memory and Competitive Control

Authors: Guanya Shi, Yiheng Lin, Soon-Jo Chung, Yisong Yue, Adam Wierman

Abstract: This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a c… ▽ More This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems. △ Less

Submitted 8 January, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Neural Information Processing Systems (NeurIPS 2020)

arXiv:2002.00260 [pdf, ps, other]

Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning

Authors: Guannan Qu, Adam Wierman

Abstract: We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. Additionally, we specialize the result to asynchronous $Q$-learning. The resulting bound matches the sharpest available bound for synchronous $Q$-learning, and improves over previous known boun… ▽ More We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. Additionally, we specialize the result to asynchronous $Q$-learning. The resulting bound matches the sharpest available bound for synchronous $Q$-learning, and improves over previous known bounds for asynchronous $Q$-learning. △ Less

Submitted 1 February, 2020; originally announced February 2020.

arXiv:1912.02906 [pdf, other]

Scalable Reinforcement Learning for Multi-Agent Networked Systems

Authors: Guannan Qu, Adam Wierman, Na Li

Abstract: We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large ne… ▽ More We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an $O(ρ^κ)$-approximation of a stationary point of the objective for some $ρ\in(0,1)$, with complexity that scales with the local state-action space size of the largest $κ$-hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics and traffic. △ Less

Submitted 31 October, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: Accepted to Operations Research. Conference version appeared in 2nd Learning for Dynamics and Control Conference with title "Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems". This journal version includes more examples, discussions and simulations

arXiv:1907.00814 [pdf, other]

doi 10.1007/s12532-020-00193-4

Signomial and Polynomial Optimization via Relative Entropy and Partial Dualization

Authors: Riley Murray, Venkat Chandrasekaran, Adam Wierman

Abstract: We describe a generalization of the Sums-of-AM/GM Exponential (SAGE) relaxation methodology for obtaining bounds on constrained signomial and polynomial optimization problems. Our approach leverages the fact that relative entropy based SAGE certificates conveniently and transparently blend with convex duality, in a manner that Sums-of-Squares certificates do not. This more general approach not onl… ▽ More We describe a generalization of the Sums-of-AM/GM Exponential (SAGE) relaxation methodology for obtaining bounds on constrained signomial and polynomial optimization problems. Our approach leverages the fact that relative entropy based SAGE certificates conveniently and transparently blend with convex duality, in a manner that Sums-of-Squares certificates do not. This more general approach not only retains key properties of ordinary SAGE relaxations (e.g. sparsity preservation), but also inspires a novel perspective-based method of solution recovery. We illustrate the utility of our methodology with a range of examples from the global optimization literature, along with a publicly available software package. △ Less

Submitted 21 July, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: Software at https://rileyjmurray.github.io/sageopt/. Nine tables, one figure. Forty pages (with large margins). Ten pages of computational experiments; print pages 1-25 and 36-40 to skip the computational experiments. Version 2: minor simplification to section 4.2.1

MSC Class: 90C26 ACM Class: G.1.7; G.4

arXiv:1906.09891 [pdf, other]

Buy or Sell? Energy Sharing of Prosumers on Constrained Networks

Authors: Yue Chen, Shengwei Mei, Wei Wei, Steven H. Low, Adam Wierman, Feng Liu

Abstract: The advent of intelligent agents who produce and consume energy by themselves has led the smart grid into the era of "prosumer", offering the energy system and customers a unique opportunity to revaluate/trade their spot energy via a sharing initiative. To this end, designing an appropriate sharing mechanism is an issue with crucial importance and has captured great attention. This paper addresses… ▽ More The advent of intelligent agents who produce and consume energy by themselves has led the smart grid into the era of "prosumer", offering the energy system and customers a unique opportunity to revaluate/trade their spot energy via a sharing initiative. To this end, designing an appropriate sharing mechanism is an issue with crucial importance and has captured great attention. This paper addresses the prosumers' demand response problem via energy sharing. Under a general supply-demand function bidding scheme, a sharing market clearing procedure considering network constraints is proposed, which gives rise to a generalized Nash game. The existence and uniqueness of market equilibrium are proved in non-congested cases. When congestion occurs, infinitely much equilibrium may exist because the strategy spaces of prosumers are correlated. A price-regulation procedure is introduced in the sharing mechanism, which outcomes a unique equilibrium that is fair to all participants. Properties of the improved sharing mechanism, including the individual rational behaviors of prosumers and the components of sharing price, are revealed. When the number of prosumers increases, the proposed sharing mechanism approaches social optimum. Even with fixed number of resources, introducing competition can result in a decreasing social cost. Illustrative examples validate the theoretical results and provide more insights for the energy sharing research. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 13 pages, 7 figures

arXiv:1905.12776 [pdf, other]

Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization

Authors: Gautam Goel, Yiheng Lin, Haoyuan Sun, Adam Wierman

Abstract: We study online convex optimization in a setting where the learner seeks to minimize the sum of a per-round hitting cost and a movement cost which is incurred when changing decisions between rounds. We prove a new lower bound on the competitive ratio of any online algorithm in the setting where the costs are $m$-strongly convex and the movement costs are the squared $\ell_2$ norm. This lower bound… ▽ More We study online convex optimization in a setting where the learner seeks to minimize the sum of a per-round hitting cost and a movement cost which is incurred when changing decisions between rounds. We prove a new lower bound on the competitive ratio of any online algorithm in the setting where the costs are $m$-strongly convex and the movement costs are the squared $\ell_2$ norm. This lower bound shows that no algorithm can achieve a competitive ratio that is $o(m^{-1/2})$ as $m$ tends to zero. No existing algorithms have competitive ratios matching this bound, and we show that the state-of-the-art algorithm, Online Balanced Decent (OBD), has a competitive ratio that is $Ω(m^{-2/3})$. We additionally propose two new algorithms, Greedy OBD (G-OBD) and Regularized OBD (R-OBD) and prove that both algorithms have an $O(m^{-1/2})$ competitive ratio. The result for G-OBD holds when the hitting costs are quasiconvex and the movement costs are the squared $\ell_2$ norm, while the result for R-OBD holds when the hitting costs are $m$-strongly convex and the movement costs are Bregman Divergences. Further, we show that R-OBD simultaneously achieves constant, dimension-free competitive ratio and sublinear regret when hitting costs are strongly convex. △ Less

Submitted 21 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1901.09161 [pdf, ps, other]

Competitive Online Optimization under Inventory Constraints

Authors: Qiulin Lin, Hanling Yi, John Pang, Minghua Chen, Adam Wierman, Michael Honig, Yuanzhang Xiao

Abstract: This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf… ▽ More This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf{CR-Pursuit}, and prove that it achieves the minimal competitive ratio among all deterministic algorithms (up to a problem-dependent constant factor) for inventory-constrained online optimization. Our algorithm and its analysis not only simplify and unify the state-of-the-art results for the standard one-way trading problem, but they also establish novel bounds for generalizations including concave revenue functions. For example, for one-way trading with price elasticity, the \textsf{CR-Pursuit} algorithm achieves a competitive ratio that is within a small additive constant (i.e., 1/3) to the lower bound of $\ln θ+1$, where $θ$ is the ratio between the maximum and minimum base prices. △ Less

Submitted 25 January, 2019; originally announced January 2019.

Comments: The first two authors contribute to the work equally. Manuscript submitted October 22, 2018; accepted December 17, 2018; to appear in ACM SIGMETRICS 2019

Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems (for publishing papers of ACM SIGMETRICS), 2019

arXiv:1901.04372 [pdf, other]

Online Inventory Management with Application to Energy Procurement in Data Centers

Authors: Lin Yang, Mohammad H. Hajiesmaili, Ramesh Sitaraman, Enrique Mallada, Wing S. Wong, Adam Wierman

Abstract: Motivated by the application of energy storage management in electricity markets, this paper considers the problem of online linear programming with inventory management constraints. Specifically, a decision maker should satisfy some units of an asset as her demand, either form a market with time-varying price or from her own inventory. The decision maker is presented a price in slot-by-slot manne… ▽ More Motivated by the application of energy storage management in electricity markets, this paper considers the problem of online linear programming with inventory management constraints. Specifically, a decision maker should satisfy some units of an asset as her demand, either form a market with time-varying price or from her own inventory. The decision maker is presented a price in slot-by-slot manner, and must immediately decide the purchased amount with the current price to cover the demand or to store in inventory for covering the future demand. The inventory has a limited capacity and its critical role is to buy and store assets at low price and use the stored assets to cover the demand at high price. The ultimate goal of the decision maker is to cover the demands while minimizing the cost of buying assets from the market. We propose BatMan, an online algorithm for simple inventory models, and BatManRate, an extended version for the case with rate constraints. Both BatMan and BatManRate achieve optimal competitive ratios, meaning that no other online algorithm can achieve a better theoretical guarantee. To illustrate the results, we use the proposed algorithms to design and evaluate energy procurement and storage management strategies for data centers with a portfolio of energy sources including the electric grid, local renewable generation, and energy storage systems. △ Less

Submitted 14 January, 2019; originally announced January 2019.

arXiv:1810.10132 [pdf, other]

Smoothed Online Optimization for Regression and Control

Authors: Gautam Goel, Adam Wierman

Abstract: We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio $3 + O(1/m)$, irrespective of the ambient dimension. Additionally, we show th… ▽ More We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio $3 + O(1/m)$, irrespective of the ambient dimension. Additionally, we show that when the sequence of cost functions is $ε$-smooth, OBD has near-optimal dynamic regret and maintains strong per-round accuracy. We demonstrate the generality of our approach by showing that the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control. △ Less

Submitted 4 April, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

arXiv:1810.01614 [pdf, other]

doi 10.1007/s10208-021-09497-w

Newton Polytopes and Relative Entropy Optimization

Authors: Riley Murray, Venkat Chandrasekaran, Adam Wierman

Abstract: Certifying function nonnegativity is a ubiquitous problem in computational mathematics, with especially notable applications in optimization. We study the question of certifying nonnegativity of signomials based on the recently proposed approach of Sums-of-AM/GM-Exponentials (SAGE) decomposition due to the second author and Shah. The existence of a SAGE decomposition is a sufficient condition for… ▽ More Certifying function nonnegativity is a ubiquitous problem in computational mathematics, with especially notable applications in optimization. We study the question of certifying nonnegativity of signomials based on the recently proposed approach of Sums-of-AM/GM-Exponentials (SAGE) decomposition due to the second author and Shah. The existence of a SAGE decomposition is a sufficient condition for nonnegativity of a signomial, and it can be verified by solving a tractable convex relative entropy program. We present new structural properties of SAGE certificates such as a characterization of the extreme rays of the cones associated to these decompositions as well as an appealing form of sparsity preservation. These lead to a number of important consequences such as conditions under which signomial nonnegativity is equivalent to the existence of a SAGE decomposition; our results represent the broadest-known class of nonconvex signomial optimization problems that can be solved efficiently via convex relaxation. The analysis in this paper proceeds by leveraging the interaction between the convex duality underlying SAGE certificates and the face structure of Newton polytopes. While our primary focus is on signomials, we also discuss how our results provide efficient methods for certifying polynomial nonnegativity, with complexity independent of the degree of a polynomial. △ Less

Submitted 14 May, 2020; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: Body shortened from 29 to 24 pages. Additional consideration to related work. Some claims made in Section 5 have been formalized. Revised within 2 months of first-round reviews

MSC Class: 52A40; 90C30; 14P15

Showing 1–50 of 59 results for author: Wierman, A