Search | arXiv e-print repository

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Authors: Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming **

Abstract: Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-on… ▽ More Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with a competing dynamically changing oracle. Finally, experiments are conducted to demonstrate the effectiveness of our approach. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Journal ref: ICLR 2023

arXiv:2405.16053 [pdf, other]

Pausing Policy Learning in Non-stationary Reinforcement Learning

Authors: Hyunin Lee, Ming **, Javad Lavaei, Somayeh Sojoudi

Abstract: Real-time inference is a challenge of real-world reinforcement learning due to temporal differences in time-varying environments: the system collects data from the past, updates the decision model in the present, and deploys it in the future. We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap. We propose forecasting an online reinforcement lear… ▽ More Real-time inference is a challenge of real-world reinforcement learning due to temporal differences in time-varying environments: the system collects data from the past, updates the decision model in the present, and deploys it in the future. We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap. We propose forecasting an online reinforcement learning framework and show that strategically pausing decision updates yields better overall performance by effectively managing aleatoric uncertainty. Theoretically, we compute an optimal ratio between policy update and hold duration, and show that a non-zero policy hold duration provides a sharper upper bound on the dynamic regret. Our experimental evaluations on three different environments also reveal that a non-zero policy hold duration yields higher rewards compared to continuous decision updates. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: conference

arXiv:2403.15099 [pdf, other]

Optimal Contract Design for End-of-Life Care Payments

Authors: Muyan Jiang, Ying Chen, Xin Chen, Javad Lavaei, Anil Aswani

Abstract: A large fraction of total healthcare expenditure occurs due to end-of-life (EOL) care, which means it is important to study the problem of more carefully incentivizing necessary versus unnecessary EOL care because this has the potential to reduce overall healthcare spending. This paper introduces a principal-agent model that integrates a mixed payment system of fee-for-service and pay-for-performa… ▽ More A large fraction of total healthcare expenditure occurs due to end-of-life (EOL) care, which means it is important to study the problem of more carefully incentivizing necessary versus unnecessary EOL care because this has the potential to reduce overall healthcare spending. This paper introduces a principal-agent model that integrates a mixed payment system of fee-for-service and pay-for-performance in order to analyze whether it is possible to better align healthcare provider incentives with patient outcomes and cost-efficiency in EOL care. The primary contributions are to derive optimal contracts for EOL care payments using a principal-agent framework under three separate models for the healthcare provider, where each model considers a different level of risk tolerance for the provider. We derive these optimal contracts by converting the underlying principal-agent models from a bilevel optimization problem into a single-level optimization problem that can be analytically solved. Our results are demonstrated using a simulation where an optimal contract is used to price intracranial pressure monitoring for traumatic brain injuries. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.06056 [pdf, other]

Absence of spurious solutions far from ground truth: A low-rank analysis with high-order losses

Authors: Ziye Ma, Ying Chen, Javad Lavaei, Somayeh Sojoudi

Abstract: Matrix sensing problems exhibit pervasive non-convexity, plaguing optimization with a proliferation of suboptimal spurious solutions. Avoiding convergence to these critical points poses a major challenge. This work provides new theoretical insights that help demystify the intricacies of the non-convex landscape. In this work, we prove that under certain conditions, critical points sufficiently dis… ▽ More Matrix sensing problems exhibit pervasive non-convexity, plaguing optimization with a proliferation of suboptimal spurious solutions. Avoiding convergence to these critical points poses a major challenge. This work provides new theoretical insights that help demystify the intricacies of the non-convex landscape. In this work, we prove that under certain conditions, critical points sufficiently distant from the ground truth matrix exhibit favorable geometry by being strict saddle points rather than troublesome local minima. Moreover, we introduce the notion of higher-order losses for the matrix sensing problem and show that the incorporation of such losses into the objective function amplifies the negative curvature around those distant critical points. This implies that increasing the complexity of the objective function via high-order losses accelerates the escape from such critical points and acts as a desirable alternative to increasing the complexity of the optimization problem via over-parametrization. By elucidating key characteristics of the non-convex optimization landscape, this work makes progress towards a comprehensive framework for tackling broader machine learning objectives plagued by non-convexity. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted by AISTATS 2024

arXiv:2310.15549 [pdf, other]

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

Authors: Ziye Ma, Javad Lavaei, Somayeh Sojoudi

Abstract: Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matri… ▽ More Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: NeurIPS23 Poster

arXiv:2309.14989 [pdf, other]

Tempo Adaptation in Non-stationary Reinforcement Learning

Authors: Hyunin Lee, Yuhao Ding, Jongmin Lee, Ming **, Javad Lavaei, Somayeh Sojoudi

Abstract: We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ($t$) rather than episode progress ($k$), where wall-clock time signifies the actual elapsed time within the fixed duration $t \in [0, T]$.… ▽ More We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ($t$) rather than episode progress ($k$), where wall-clock time signifies the actual elapsed time within the fixed duration $t \in [0, T]$. In existing works, at episode $k$, the agent rolls a trajectory and trains a policy before transitioning to episode $k+1$. In the context of the time-desynchronized environment, however, the agent at time $t_{k}$ allocates $Δt$ for trajectory generation and training, subsequently moves to the next episode at $t_{k+1}=t_{k}+Δt$. Despite a fixed total number of episodes ($K$), the agent accumulates different trajectories influenced by the choice of interaction times ($t_1,t_2,...,t_K$), significantly impacting the suboptimality gap of the policy. We propose a Proactively Synchronizing Tempo ($\texttt{ProST}$) framework that computes a suboptimal sequence {$t_1,t_2,...,t_K$} (= { $t_{1:K}$}) by minimizing an upper bound on its performance measure, i.e., the dynamic regret. Our main contribution is that we show that a suboptimal {$t_{1:K}$} trades-off between the policy training time (agent tempo) and how fast the environment changes (environment tempo). Theoretically, this work develops a suboptimal {$t_{1:K}$} as a function of the degree of the environment's non-stationarity while also achieving a sublinear dynamic regret. Our experimental evaluation on various high-dimensional non-stationary environments shows that the $\texttt{ProST}$ framework achieves a higher online return at suboptimal {$t_{1:K}$} than the existing methods. △ Less

Submitted 27 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: 53 pages. To be published in Neural Information Processing Systems (NeurIPS), 2023

arXiv:2305.17568 [pdf, other]

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Authors: Donghao Ying, Yunkai Zhang, Yuhao Ding, Alec Koppel, Javad Lavaei

Abstract: We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploratio… ▽ More We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploration, or imitations. The exponential growth of the state-action space size with the number of agents presents challenges for global observability, further exacerbated by the global coupling arising from agents' safety constraints. To tackle this issue, we propose a primal-dual method utilizing shadow reward and $κ$-hop neighbor truncation under a form of correlation decay property, where $κ$ is the communication radius. In the exact setting, our algorithm converges to a first-order stationary point (FOSP) at the rate of $\mathcal{O}\left(T^{-2/3}\right)$. In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $\widetilde{\mathcal{O}}\left(ε^{-3.5}\right)$ samples to achieve an $ε$-FOSP with an approximation error of $\mathcal{O}(φ_0^{2κ})$, where $φ_0\in (0,1)$. Finally, we demonstrate the effectiveness of our model through extensive numerical experiments. △ Less

Submitted 27 May, 2023; originally announced May 2023.

Comments: 50 pages

arXiv:2305.17567 [pdf, other]

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Authors: Mengzi Amy Guo, Donghao Ying, Javad Lavaei, Zuo-Jun Max Shen

Abstract: This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and t… ▽ More This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and their reference price, and consecutive periods in the repeated games are connected by reference price updates. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy for the single-period game, to simultaneously capture the long-run market equilibrium and stability. We propose the online projected gradient ascent algorithm (OPGA), where the firms adjust prices using the first-order derivatives of their log-revenues that can be obtained from the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by OPGA converge to the unique SNE, thereby achieving the no-regret learning and a stable market. Moreover, with appropriate step-sizes, we prove that this convergence exhibits a rate of $\mathcal{O}(1/t)$. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.10506 [pdf, other]

Exact Recovery for System Identification with More Corrupt Data than Clean Data

Authors: Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Murat Arcak

Abstract: This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing… ▽ More This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data. △ Less

Submitted 24 April, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

MSC Class: 62; 90; 93

arXiv:2302.11190 [pdf, other]

A Hitting Time Analysis for Stochastic Time-Varying Functions with Applications to Adversarial Attacks on Computation of Markov Decision Processes

Authors: Ali Yekkehkhany, Han Feng, Donghao Ying, Javad Lavaei

Abstract: Stochastic time-varying optimization is an integral part of learning in which the shape of the function changes over time in a non-deterministic manner. This paper considers multiple models of stochastic time variation and analyzes the corresponding notion of hitting time for each model, i.e., the period after which optimizing the stochastic time-varying function reveals informative statistics on… ▽ More Stochastic time-varying optimization is an integral part of learning in which the shape of the function changes over time in a non-deterministic manner. This paper considers multiple models of stochastic time variation and analyzes the corresponding notion of hitting time for each model, i.e., the period after which optimizing the stochastic time-varying function reveals informative statistics on the optimization of the target function. The studied models of time variation are motivated by adversarial attacks on the computation of value iteration in Markov decision processes. In this application, the hitting time quantifies the extent that the computation is robust to adversarial disturbance. We develop upper bounds on the hitting time by analyzing the contraction-expansion transformation appeared in the time-variation models. We prove that the hitting time of the value function in the value iteration with a probabilistic contraction-expansion transformation is logarithmic in terms of the inverse of a desired precision. In addition, the hitting time is analyzed for optimization of unknown continuous or discrete time-varying functions whose noisy evaluations are revealed over time. The upper bound for a continuous function is super-quadratic (but sub-cubic) in terms of the inverse of a desired precision and the upper bound for a discrete function is logarithmic in terms of the cardinality of the function domain. Improved bounds for convex functions are obtained and we show that such functions are learned faster than non-convex functions. Finally, we study a time-varying linear model with additive noise, where hitting time is bounded with the notion of shape dominance. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.07938 [pdf, ps, other]

Scalable Multi-Agent Reinforcement Learning with General Utilities

Authors: Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

Abstract: We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of th… ▽ More We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of the network structure, we propose a scalable distributed policy gradient algorithm with shadow reward and localized policy that consists of three steps: (1) shadow reward estimation, (2) truncated shadow Q-function estimation, and (3) truncated policy gradient estimation and policy update. Our algorithm converges, with high probability, to $ε$-stationarity with $\widetilde{\mathcal{O}}(ε^{-2})$ samples up to some approximation error that decreases exponentially in the communication radius. This is the first result in the literature on multi-agent RL with general utilities that does not require the full observability. △ Less

Submitted 26 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

Comments: Supplementary material for the contribution to American Control Conference 2023 under the same title

arXiv:2302.07828 [pdf, other]

Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points

Authors: Ziye Ma, Igor Molybog, Javad Lavaei, Somayeh Sojoudi

Abstract: This paper studies the role of over-parametrization in solving non-convex optimization problems. The focus is on the important class of low-rank matrix sensing, where we propose an infinite hierarchy of non-convex problems via the lifting technique and the Burer-Monteiro factorization. This contrasts with the existing over-parametrization technique where the search rank is limited by the dimension… ▽ More This paper studies the role of over-parametrization in solving non-convex optimization problems. The focus is on the important class of low-rank matrix sensing, where we propose an infinite hierarchy of non-convex problems via the lifting technique and the Burer-Monteiro factorization. This contrasts with the existing over-parametrization technique where the search rank is limited by the dimension of the matrix and it does not allow a rich over-parametrization of an arbitrary degree. We show that although the spurious solutions of the problem remain stationary points through the hierarchy, they will be transformed into strict saddle points (under some technical conditions) and can be escaped via local search methods. This is the first result in the literature showing that over-parametrization creates a negative curvature for esca** spurious solutions. We also derive a bound on how much over-parametrization is requited to enable the elimination of spurious solutions. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2211.10815 [pdf, other]

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design

Authors: Yuhao Ding, Ming **, Javad Lavaei

Abstract: We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely R… ▽ More We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: 33 pages,3 figures, AAAI 2023. arXiv admin note: text overlap with arXiv:2111.03947, arXiv:2102.05406 by other authors

arXiv:2210.01421 [pdf, other]

Learning of Dynamical Systems under Adversarial Attacks -- Null Space Property Perspective

Authors: Han Feng, Baturalp Yalcin, Javad Lavaei

Abstract: We study the identification of a linear time-invariant dynamical system affected by large-and-sparse disturbances modeling adversarial attacks or faults. Under the assumption that the states are measurable, we develop necessary and sufficient conditions for the recovery of the system matrices by solving a constrained lasso-type optimization problem. In addition, we provide an upper bound on the es… ▽ More We study the identification of a linear time-invariant dynamical system affected by large-and-sparse disturbances modeling adversarial attacks or faults. Under the assumption that the states are measurable, we develop necessary and sufficient conditions for the recovery of the system matrices by solving a constrained lasso-type optimization problem. In addition, we provide an upper bound on the estimation error whenever the disturbance sequence is a combination of small noise values and large adversarial values. Our results depend on the null space property that has been widely used in the lasso literature, and we investigate under what conditions this property holds for linear time-invariant dynamical systems. Lastly, we further study the conditions for a specific probabilistic model and support the results with numerical experiments. △ Less

Submitted 5 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 8 pages, 2 figures

MSC Class: 93

arXiv:2208.07469 [pdf, ps, other]

Semidefinite Programming versus Burer-Monteiro Factorization for Matrix Sensing

Authors: Baturalp Yalcin, Ziye Ma, Javad Lavaei, Somayeh Sojoudi

Abstract: Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complex… ▽ More Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complexities, whereas the B-M method may return a spurious solution due to the non-convexity of the problem. The existing theoretical guarantees for the success of these methods have led to similar conservative conditions, which may wrongly imply that these methods have comparable performances. In this paper, we shed light on some major differences between these two methods. First, we present a class of structured matrix completion problems for which the B-M methods fail with an overwhelming probability, while the SDP method works correctly. Second, we identify a class of highly sparse matrix completion problems for which the B-M method works and the SDP method fails. Third, we prove that although the B-M method exhibits the same performance independent of the rank of the unknown solution, the success of the SDP method is correlated to the rank of the solution and improves as the rank increases. Unlike the existing literature that has mainly focused on those instances of matrix sensing for which both SDP and B-M work, this paper offers the first result on the unique merit of each method over the alternative approach. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 21 pages

MSC Class: 90C22; 90C26

arXiv:2205.10715 [pdf, other]

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

Authors: Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen

Abstract: We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the chal… ▽ More We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an $O(T^{-1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O(T^{-1/2})$ under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an $\widetilde{O}(ε^{-4})$ sample complexity for $ε$-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate the effectiveness of our methods through numerical experiments. △ Less

Submitted 26 May, 2024; v1 submitted 21 May, 2022; originally announced May 2022.

arXiv:2204.02364 [pdf, other]

A New Complexity Metric for Nonconvex Rank-one Generalized Matrix Completion

Authors: Haixiang Zhang, Baturalp Yalcin, Javad Lavaei, Somayeh Sojoudi

Abstract: In this work, we develop a new complexity metric for an important class of low-rank matrix optimization problems in both symmetric and asymmetric cases, where the metric aims to quantify the complexity of the nonconvex optimization landscape of each problem and the success of local search methods in solving the problem. The existing literature has focused on two complexity bounds. The RIP constant… ▽ More In this work, we develop a new complexity metric for an important class of low-rank matrix optimization problems in both symmetric and asymmetric cases, where the metric aims to quantify the complexity of the nonconvex optimization landscape of each problem and the success of local search methods in solving the problem. The existing literature has focused on two complexity bounds. The RIP constant is commonly used to characterize the complexity of matrix sensing problems. On the other hand, the incoherence and the sampling rate are used when analyzing matrix completion problems. The proposed complexity metric has the potential to generalize these two notions and also applies to a much larger class of problems. To mathematically study the properties of this metric, we focus on the rank-$1$ generalized matrix completion problem and illustrate the usefulness of the new complexity metric on three types of instances, namely, instances with the RIP condition, instances obeying the Bernoulli sampling model, and a synthetic example. We show that the complexity metric exhibits a consistent behavior in the three cases, even when other existing conditions fail to provide theoretical guarantees. These observations provide a strong implication that the new complexity metric has the potential to generalize various conditions of optimization complexity proposed for different applications. Furthermore, we establish theoretical results to provide sufficient and necessary conditions for the existence of spurious solutions in terms of the proposed complexity metric. This contrasts with the RIP and incoherence conditions that fail to provide any necessary condition. △ Less

Submitted 21 July, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2201.11965 [pdf, ps, other]

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Authors: Yuhao Ding, Javad Lavaei

Abstract: We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumul… ▽ More We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying constraints under which we can guarantee the safety in the long run. We also propose the \underline{P}eriodically \underline{R}estarted \underline{O}ptimistic \underline{P}rimal-\underline{D}ual \underline{P}roximal \underline{P}olicy \underline{O}ptimization (PROPD-PPO) algorithm that can coordinate with both two conditions. Furthermore, a dynamic regret bound and a constraint violation bound are established for the proposed algorithm in both the linear kernel CMDP function approximation setting and the tabular CMDP setting under two alternative conditions. This paper provides the first provably efficient algorithm for non-stationary CMDPs with safe exploration. △ Less

Submitted 19 November, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: 32 pages, AAAI 2023

arXiv:2110.10279 [pdf, other]

Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods

Authors: Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Somayeh Sojoudi

Abstract: It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work,… ▽ More It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work, we provide a negative answer to the above question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. Based on those results, we define a new complexity metric that potentially measures the solvability of low-rank matrix optimization problems based on the B-M factorization approach. In addition, we show that more measurements of the ground truth matrix can deteriorate the landscape, which further reveals the unfavorable behavior of the B-M factorization on general low-rank matrix optimization problems. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 21 pages, 1 figure

arXiv:2110.10117 [pdf, other]

Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization

Authors: Yuhao Ding, Junzi Zhang, Javad Lavaei

Abstract: Entropy regularization is an efficient technique for encouraging exploration and preventing a premature convergence of (vanilla) policy gradient methods in reinforcement learning (RL). However, the theoretical understanding of entropy regularized RL algorithms has been limited. In this paper, we revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrizat… ▽ More Entropy regularization is an efficient technique for encouraging exploration and preventing a premature convergence of (vanilla) policy gradient methods in reinforcement learning (RL). However, the theoretical understanding of entropy regularized RL algorithms has been limited. In this paper, we revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization, whose convergence has so far only been established assuming access to exact gradient oracles. To go beyond this scenario, we propose the first set of (nearly) unbiased stochastic policy gradient estimators with trajectory-level entropy regularization, with one being an unbiased visitation measure-based estimator and the other one being a nearly unbiased yet more practical trajectory-based estimator. We prove that although the estimators themselves are unbounded in general due to the additional logarithmic policy rewards introduced by the entropy term, the variances are uniformly bounded. We then propose a two-phase stochastic policy gradient (PG) algorithm that uses a large batch size in the first phase to overcome the challenge of the stochastic approximation due to the non-coercive landscape, and uses a small batch size in the second phase by leveraging the curvature information around the optimal policy. We establish a global optimality convergence result and a sample complexity of $\widetilde{\mathcal{O}}(\frac{1}{ε^2})$ for the proposed algorithm. Our result is the first global convergence and sample complexity results for the stochastic entropy-regularized vanilla PG method. △ Less

Submitted 10 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

arXiv:2110.10116 [pdf, ps, other]

On the Global Optimum Convergence of Momentum-based Policy Gradient

Authors: Yuhao Ding, Junzi Zhang, Javad Lavaei

Abstract: Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum… ▽ More Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by $\tilde{\mathcal{O}}(ε^{-1.5})$ and $\tilde{\mathcal{O}}(ε^{-1})$, respectively, where $ε>0$ is the target tolerance. Our work is the first one that obtains global convergence results for the momentum-based PG methods. For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving $\tilde{O}(ε^{-3})$ global optimality sample complexity. Finally, as a by-product, our methods also provide general framework for analyzing the global convergence rates of stochastic PG methods, which can be easily applied and extended to different PG estimators. △ Less

Submitted 22 May, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: AISTATS 2022

arXiv:2110.08923 [pdf, ps, other]

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

Authors: Donghao Ying, Yuhao Ding, Javad Lavaei

Abstract: We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be… ▽ More We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided. △ Less

Submitted 7 April, 2023; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: 24 pages, AISTATS22

arXiv:2105.08232 [pdf, other]

Sharp Restricted Isometry Property Bounds for Low-rank Matrix Recovery Problems with Corrupted Measurements

Authors: Ziye Ma, Yingjie Bi, Javad Lavaei, Somayeh Sojoudi

Abstract: In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum d… ▽ More In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum distance between an arbitrary local minimizer and the ground truth under the assumption that the RIP constant is smaller than $1/2$. We show that this distance shrinks to zero as the intensity of the noise reduces. Our new guarantee is sharp in terms of the RIP constant and is much stronger than the existing results. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. Next, we prove the strict saddle property, which guarantees the global convergence of the perturbed gradient descent method in polynomial time. The developed results demonstrate how the noise intensity and the RIP constant of the problem affect the landscape of the problem. △ Less

Submitted 25 July, 2023; v1 submitted 17 May, 2021; originally announced May 2021.

arXiv:2104.13348 [pdf, other]

Local and Global Linear Convergence of General Low-rank Matrix Recovery Problems

Authors: Yingjie Bi, Haixiang Zhang, Javad Lavaei

Abstract: We study the convergence rate of gradient-based local search methods for solving low-rank matrix recovery problems with general objectives in both symmetric and asymmetric cases, under the assumption of the restricted isometry property. First, we develop a new technique to verify the Polyak-Lojasiewicz inequality in a neighborhood of the global minimizers, which leads to a local linear convergence… ▽ More We study the convergence rate of gradient-based local search methods for solving low-rank matrix recovery problems with general objectives in both symmetric and asymmetric cases, under the assumption of the restricted isometry property. First, we develop a new technique to verify the Polyak-Lojasiewicz inequality in a neighborhood of the global minimizers, which leads to a local linear convergence region for the gradient descent method. Second, based on the local convergence result and a sharp strict saddle property proven in this paper, we present two new conditions that guarantee the global linear convergence of the perturbed gradient descent method. The developed local and global convergence results provide much stronger theoretical guarantees than the existing results. As a by-product, this work significantly improves the existing bounds on the RIP constant required to guarantee the non-existence of spurious solutions. △ Less

Submitted 8 March, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

arXiv:2104.10356 [pdf, ps, other]

General Low-rank Matrix Optimization: Geometric Analysis and Sharper Bounds

Authors: Haixiang Zhang, Yingjie Bi, Javad Lavaei

Abstract: This paper considers the global geometry of general low-rank minimization problems via the Burer-Monterio factorization approach. For the rank-$1$ case, we prove that there is no spurious second-order critical point for both symmetric and asymmetric problems if the rank-$2$ RIP constant $δ$ is less than $1/2$. Combining with a counterexample with $δ=1/2$, we show that the derived bound is the shar… ▽ More This paper considers the global geometry of general low-rank minimization problems via the Burer-Monterio factorization approach. For the rank-$1$ case, we prove that there is no spurious second-order critical point for both symmetric and asymmetric problems if the rank-$2$ RIP constant $δ$ is less than $1/2$. Combining with a counterexample with $δ=1/2$, we show that the derived bound is the sharpest possible. For the arbitrary rank-$r$ case, the same property is established when the rank-$2r$ RIP constant $δ$ is at most $1/3$. We design a counterexample to show that the non-existence of spurious second-order critical points may not hold if $δ$ is at least $1/2$. In addition, for any problem with $δ$ between $1/3$ and $1/2$, we prove that all second-order critical points have a positive correlation to the ground truth. Finally, the strict saddle property, which can lead to the polynomial-time global convergence of various algorithms, is established for both the symmetric and asymmetric problems when the rank-$2r$ RIP constant $δ$ is less than $1/3$. The results of this paper significantly extend several existing bounds in the literature. △ Less

Submitted 21 April, 2021; originally announced April 2021.

arXiv:2012.02427 [pdf, other]

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

Authors: Haixiang Zhang, Zeyu Zheng, Javad Lavaei

Abstract: We develop and analyze a set of new sequential simulation-optimization algorithms for large-scale multi-dimensional discrete optimization via simulation problems with a convexity structure. The "large-scale" notion refers to that the decision variable has a large number of values to choose from on each dimension. The proposed algorithms are targeted to identify a solution that is close to the opti… ▽ More We develop and analyze a set of new sequential simulation-optimization algorithms for large-scale multi-dimensional discrete optimization via simulation problems with a convexity structure. The "large-scale" notion refers to that the decision variable has a large number of values to choose from on each dimension. The proposed algorithms are targeted to identify a solution that is close to the optimal solution given any precision level with any given probability. To achieve this target, utilizing the convexity structure, our algorithm design does not need to scan all the choices of the decision variable, but instead sequentially draws a subset of choices of the decision variable and uses them to "localize" potentially near-optimal solutions to an adaptively shrinking region. To show the power of the localization operation, we first consider one-dimensional large-scale problems. We propose the shrinking uniform sampling algorithm, which is proved to achieve the target with an optimal expected simulation cost under an asymptotic criterion. For multi-dimensional problems, we combine the idea of localization with subgradient information and propose a framework to design stochastic cutting-plane methods and the dimension reduction algorithm, whose expected simulation cost have a low dependence on the scale and the dimension of the problems. The proposed algorithms do not require prior information about the Lipschitz constant of the objective function and the simulation costs are upper bounded by a value that is independent of the Lipschitz constant. Finally, we propose an adaptive algorithm to deal with the unknown noise variance case under the assumption that the randomness of the system is Gaussian. We implement the proposed algorithms on both synthetic and queueing simulation optimization problems, and demonstrate better performances compared to benchmark methods. △ Less

Submitted 18 January, 2022; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2010.16250 [pdf, other]

Gradient-based Algorithms for Convex Discrete Optimization via Simulation

Authors: Haixiang Zhang, Zeyu Zheng, Javad Lavaei

Abstract: We propose new sequential simulation-optimization algorithms for general convex optimization via simulation problems with high-dimensional discrete decision space. The performance of each choice of discrete decision variables is evaluated via stochastic simulation replications. If an upper bound on the overall level of uncertainties is known, our proposed simulation-optimization algorithms utilize… ▽ More We propose new sequential simulation-optimization algorithms for general convex optimization via simulation problems with high-dimensional discrete decision space. The performance of each choice of discrete decision variables is evaluated via stochastic simulation replications. If an upper bound on the overall level of uncertainties is known, our proposed simulation-optimization algorithms utilize the discrete convex structure and are guaranteed with high probability to find a solution that is close to the best within any given user-specified precision level. The proposed algorithms work for any general convex problem and the efficiency is demonstrated by proven upper bounds on simulation costs. The upper bounds demonstrate a polynomial dependence on the dimension and scale of the decision space. For some discrete optimization via simulation problems, a gradient estimator may be available at low costs along with a single simulation replication. By integrating gradient estimators, which are possibly biased, we propose simulation-optimization algorithms to achieve optimality guarantees with a reduced dependence on the dimension under moderate assumptions on the bias. △ Less

Submitted 11 February, 2022; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: Accepted by Operations Research. Title changed from "Discrete Convex Simulation Optimization" to "Gradient-based Algorithms for Convex Discrete Optimization via Simulation"

arXiv:2010.04349 [pdf, other]

Global and Local Analyses of Nonlinear Low-Rank Matrix Recovery Problems

Authors: Yingjie Bi, Javad Lavaei

Abstract: The restricted isometry property (RIP) is a well-known condition that guarantees the absence of spurious local minima in low-rank matrix recovery problems with linear measurements. In this paper, we introduce a novel property named bound difference property (BDP) to study low-rank matrix recovery problems with nonlinear measurements. Using RIP and BDP jointly, we first focus on the rank-1 matrix r… ▽ More The restricted isometry property (RIP) is a well-known condition that guarantees the absence of spurious local minima in low-rank matrix recovery problems with linear measurements. In this paper, we introduce a novel property named bound difference property (BDP) to study low-rank matrix recovery problems with nonlinear measurements. Using RIP and BDP jointly, we first focus on the rank-1 matrix recovery problem, for which we propose a new criterion to certify the nonexistence of spurious local minima over the entire space. We then analyze the general case with an arbitrary rank and derive a condition to rule out the possibility of having a spurious solution in a ball around the true solution. The developed conditions lead to much stronger theoretical guarantees than the existing bounds on RIP. △ Less

Submitted 10 December, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

arXiv:2006.00453 [pdf, ps, other]

When Does MAML Objective Have Benign Landscape?

Authors: Igor Molybog, Javad Lavaei

Abstract: The paper studies the complexity of the optimization problem behind the Model-Agnostic Meta-Learning (MAML) algorithm. The goal of the study is to determine the global convergence of MAML on sequential decision-making tasks possessing a common structure. We are curious to know when, if at all, the benign landscape of the underlying tasks results in a benign landscape of the corresponding MAML obje… ▽ More The paper studies the complexity of the optimization problem behind the Model-Agnostic Meta-Learning (MAML) algorithm. The goal of the study is to determine the global convergence of MAML on sequential decision-making tasks possessing a common structure. We are curious to know when, if at all, the benign landscape of the underlying tasks results in a benign landscape of the corresponding MAML objective. For illustration, we analyze the landscape of the MAML objective on LQR tasks to determine what types of similarities in their structures enable the algorithm to converge to the globally optimal solution. △ Less

Submitted 10 December, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

Comments: 12 pages, 3 figures

arXiv:2004.14328 [pdf, other]

Penalized Semidefinite Programming for Quadratically-Constrained Quadratic Optimization

Authors: Ramtin Madani, Mohsen Kheirandishfard, Javad Lavaei, Alper Atamturk

Abstract: In this paper, we give a new penalized semidefinite programming approach for non-convex quadratically-constrained quadratic programs (QCQPs). We incorporate penalty terms into the objective of convex relaxations in order to retrieve feasible and near-optimal solutions for non-convex QCQPs. We introduce a generalized linear independence constraint qualification (GLICQ) criterion and prove that any… ▽ More In this paper, we give a new penalized semidefinite programming approach for non-convex quadratically-constrained quadratic programs (QCQPs). We incorporate penalty terms into the objective of convex relaxations in order to retrieve feasible and near-optimal solutions for non-convex QCQPs. We introduce a generalized linear independence constraint qualification (GLICQ) criterion and prove that any GLICQ regular point that is sufficiently close to the feasible set can be used to construct an appropriate penalty term and recover a feasible solution. Inspired by these results, we develop a heuristic sequential procedure that preserves feasibility and aims to improve the objective value at each iteration. Numerical experiments on large-scale system identification problems as well as benchmark instances from the library of quadratic programming (QPLIB) demonstrate the ability of the proposed penalized semidefinite programs in finding near-optimal solutions for non-convex QCQP. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:1912.00561 [pdf, other]

Esca** spurious local minimum trajectories in online time-varying nonconvex optimization

Authors: Yuhao Ding, Javad Lavaei, Murat Arcak

Abstract: A major limitation of online algorithms that track the optimizers of time-varying nonconvex optimization problems is that they focus on a specific local minimum trajectory, which may lead to poor spurious local solutions. In this paper, we show that the natural temporal variation may help simple online tracking methods find and track time-varying global minima. To this end, we investigate the prop… ▽ More A major limitation of online algorithms that track the optimizers of time-varying nonconvex optimization problems is that they focus on a specific local minimum trajectory, which may lead to poor spurious local solutions. In this paper, we show that the natural temporal variation may help simple online tracking methods find and track time-varying global minima. To this end, we investigate the properties of a time-varying projected gradient flow system with inertia, which can be regarded as the continuous-time limit of (1) the optimality conditions for a discretized sequential optimization problem with a proximal regularization and (2) the online tracking scheme. We introduce the notion of the dominant trajectory and show that the inherent temporal variation could re-shape the landscape of the Lagrange functional and help a proximal algorithm escape the spurious local minimum trajectories if the global minimum trajectory is dominant. For a problem with twice continuously differentiable objective function and constraints, sufficient conditions are derived to guarantee that no matter how a local search method is initialized, it will track a time-varying global solution after some time. The results are illustrated on a benchmark example with many local minima. △ Less

Submitted 25 January, 2021; v1 submitted 1 December, 2019; originally announced December 2019.

arXiv:1911.08368 [pdf, other]

doi 10.1109/TCNS.2020.2966588

Large-Scale Traffic Signal Offset Optimization

Authors: Yi Ouyang, Richard Y. Zhang, Javad Lavaei, Pravin Varaiya

Abstract: The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this paper, we present a novel algorithm to solve this new formula… ▽ More The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this paper, we present a novel algorithm to solve this new formulation to near-global optimality on a large-scale. Specifically, we solve a convex relaxation of the nonconvex problem using a tree decomposition reduction, and use randomized rounding to recover a near-global solution. We prove that the algorithm always delivers solutions of expected value at least 0.785 times the globally optimal value. Moreover, assuming that the topology of the traffic network is "tree-like", we prove that the algorithm has near-linear time complexity with respect to the number of intersections. These theoretical guarantees are experimentally validated on the Berkeley, Manhattan, and Los Angeles traffic networks. In our numerical results, the empirical time complexity of the algorithm is linear, and the solutions have objectives within 0.99 times the globally optimal value. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Journal ref: IEEE Transactions on Control of Network Systems 2020

arXiv:1908.10315 [pdf, other]

Boundary Defense against Cyber Threat for Power System Operation

Authors: Ming **, Javad Lavaei, Somayeh Sojoudi, Ross Baldick

Abstract: The operation of power grids is becoming increasingly data-centric. While the abundance of data could improve the efficiency of the system, it poses major reliability challenges. In particular, state estimation aims to learn the behavior of the network from data but an undetected attack on this problem could lead to a large-scale blackout. Nevertheless, understanding vulnerability of state estimat… ▽ More The operation of power grids is becoming increasingly data-centric. While the abundance of data could improve the efficiency of the system, it poses major reliability challenges. In particular, state estimation aims to learn the behavior of the network from data but an undetected attack on this problem could lead to a large-scale blackout. Nevertheless, understanding vulnerability of state estimation against cyber attacks has been hindered by the lack of tools studying the topological and data-analytic aspects of the network. Algorithmic robustness is of critical need to extract reliable information from abundant but untrusted grid data. We propose a robust state estimation framework that leverages network sparsity and data abundance. For a large-scale power grid, we quantify, analyze, and visualize the regions of the network prone to cyber attacks. We also propose an optimization-based graphical boundary defense mechanism to identify the border of the geographical area whose data has been manipulated. The proposed method does not allow a local attack to have a global effect on the data analysis of the entire network, which enhances the situational awareness of the grid especially in the face of adversity. The developed mathematical framework reveals key geometric and algebraic factors that can affect algorithmic robustness and is used to study the vulnerability of the U.S. power grid in this paper. △ Less

Submitted 4 August, 2019; originally announced August 2019.

arXiv:1905.09937 [pdf, other]

On the Absence of Spurious Local Trajectories in Time-varying Nonconvex Optimization

Authors: S. Fattahi, C. Josz, Y. Ding, R. Mohammadi, J. Lavaei, S. Sojoudi

Abstract: In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution… ▽ More In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution in nonconvex (time-invariant) optimization. We develop an ordinary differential equation (ODE) associated with a time-varying nonlinear dynamical system which, at limit, characterizes the spurious local solutions of the time-varying optimization problem. We prove that the absence of spurious local trajectory is closely related to the transient behavior of the developed system. In particular, we show that if the problem is time-varying, the data variation may force all of the ODE trajectories initialized at arbitrary local minima at the initial time to gradually converge to the global solution trajectory. We study the Jacobian of the dynamical system along a local minimum trajectory and show how its eigenvalues are manipulated by the natural data variation in the problem, which may consequently trigger esca** poor local minima over time. △ Less

Submitted 30 October, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

arXiv:1905.09915 [pdf, other]

Esca** Locally Optimal Decentralized Control Polices via Dam**

Authors: Han Feng, Javad Lavaei

Abstract: We study the evolution of locally optimal decentralized controllers with the dam** of the control system. Empirically it is shown that even for instances with an exponential number of connected components, dam** merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic prope… ▽ More We study the evolution of locally optimal decentralized controllers with the dam** of the control system. Empirically it is shown that even for instances with an exponential number of connected components, dam** merges all local solutions to the one global solution. We characterize the evolution of locally optimal solutions with the notion of hemi-continuity and further derive asymptotic properties of the objective function and of the locally optimal controllers as the dam** becomes large. Especially, we prove that with enough dam**, there is no spurious locally optimal controller with favorable control structures. The convoluted behavior of the locally optimal trajectory is illustrated with numerical examples. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: 20 pages, 9 figures

arXiv:1903.08634 [pdf, other]

Aggressive Local Search for Constrained Optimal Control Problems with Many Local Minima

Authors: Yuhao Ding, Han Feng, Javad Lavaei

Abstract: This paper is concerned with numerically finding a global solution of constrained optimal control problems with many local minima. The focus is on the optimal decentralized control (ODC) problem, whose feasible set is recently shown to have an exponential number of connected components and consequently an exponential number of local minima. The rich literature of numerical algorithms for nonlinear… ▽ More This paper is concerned with numerically finding a global solution of constrained optimal control problems with many local minima. The focus is on the optimal decentralized control (ODC) problem, whose feasible set is recently shown to have an exponential number of connected components and consequently an exponential number of local minima. The rich literature of numerical algorithms for nonlinear optimization suggests that if a local search algorithm is initialized in an arbitrary connected component of the feasible set, it would search only within that component and find a stationary point there. This is based on the fact that numerical algorithms are designed to generate a sequence of points (via searching for descent directions and adjusting the step size), whose corresponding continuous path is trapped in a single connected component. In contrast with this perception rooted in convex optimization, we numerically illustrate that local search methods for non-convex constrained optimization can obliviously jump between different connected components to converge to a global minimum, via an aggressive step size adjustment using backtracking and the Armijio rule. To support the observations, we prove that from almost every arbitrary point in any connected component of the feasible set, it is possible to generate a sequence of points using local search to jump to different components and converge to a global solution. However, due to the NP-hardness of the problem, such fine-tuning of the parameters of a local search algorithm may need prior knowledge or be time consuming. This paper offers the first result on esca** non-global local solutions of constrained optimal control problems with complicated feasible sets. △ Less

Submitted 20 March, 2019; originally announced March 2019.

arXiv:1901.01631 [pdf, other]

Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery

Authors: Richard Y. Zhang, Somayeh Sojoudi, Javad Lavaei

Abstract: Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $δ$. If $δ$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $δ$ to guarantee the inexistence of spur… ▽ More Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $δ$. If $δ$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $δ$ to guarantee the inexistence of spurious local minima. Using the technique, we prove that in the case of a rank-1 ground truth, an RIP constant of $δ<1/2$ is both necessary and sufficient for exact recovery from any arbitrary initial point (such as a random point). We also prove a local recovery result: given an initial point $x_{0}$ satisfying $f(x_{0})\le(1-δ)^{2}f(0)$, any descent algorithm that converges to second-order optimality guarantees exact recovery. △ Less

Submitted 13 June, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

Comments: v2: fixed several typos; v3: accepted at JMLR

Journal ref: Journal of Machine Learning Research 20 (114): 1-34, 2019

arXiv:1810.11505 [pdf, other]

Stability-certified reinforcement learning: A control-theoretic perspective

Authors: Ming **, Javad Lavaei

Abstract: We investigate the important problem of certifying stability of reinforcement learning policies when interconnected with nonlinear dynamical systems. We show that by regulating the input-output gradients of policies, strong guarantees of robust stability can be obtained based on a proposed semidefinite programming feasibility problem. The method is able to certify a large set of stabilizing contro… ▽ More We investigate the important problem of certifying stability of reinforcement learning policies when interconnected with nonlinear dynamical systems. We show that by regulating the input-output gradients of policies, strong guarantees of robust stability can be obtained based on a proposed semidefinite programming feasibility problem. The method is able to certify a large set of stabilizing controllers by exploiting problem-specific structures; furthermore, we analyze and establish its (non)conservatism. Empirical evaluations on two decentralized control tasks, namely multi-flight formation and power system frequency regulation, demonstrate that the reinforcement learning agents can have high performance within the stability-certified parameter space, and also exhibit stable learning behaviors in the long run. △ Less

Submitted 26 October, 2018; originally announced October 2018.

arXiv:1805.10251 [pdf, other]

How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?

Authors: Richard Y. Zhang, Cédric Josz, Somayeh Sojoudi, Javad Lavaei

Abstract: When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RI… ▽ More When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $δ=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances. △ Less

Submitted 30 October, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: 32nd Conference on Neural Information Processing Systems (NIPS 2018)

arXiv:1805.08204 [pdf, other]

A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization

Authors: Cedric Josz, Yi Ouyang, Richard Y. Zhang, Javad Lavaei, Somayeh Sojoudi

Abstract: We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Gl… ▽ More We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization. △ Less

Submitted 31 October, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: 22 pages, 13 figures

MSC Class: 90C26

arXiv:1711.10428 [pdf, other]

A Bound Strengthening Method for Optimal Transmission Switching in Power Systems

Authors: Salar Fattahi, Javad Lavaei, Alper Atamturk

Abstract: This paper studies the optimal transmission switching (OTS) problem for power systems, where certain lines are fixed (uncontrollable) and the remaining ones are controllable via on/off switches. The goal is to identify a topology of the power grid that minimizes the cost of the system operation while satisfying the physical and operational constraints. Most of the existing methods for the problem… ▽ More This paper studies the optimal transmission switching (OTS) problem for power systems, where certain lines are fixed (uncontrollable) and the remaining ones are controllable via on/off switches. The goal is to identify a topology of the power grid that minimizes the cost of the system operation while satisfying the physical and operational constraints. Most of the existing methods for the problem are based on first converting the OTS into a mixed-integer linear program (MILP) or mixed-integer quadratic program (MIQP), and then iteratively solving a series of its convex relaxations. The performance of these methods depends heavily on the strength of the MILP or MIQP formulations. In this paper, it is shown that finding the strongest variable upper and lower bounds to be used in an MILP or MIQP formulation of the OTS based on the big-$M$ or McCormick inequalities is NP-hard. Furthermore, it is proven that unless P=NP, there is no constant-factor approximation algorithm for constructing these variable bounds. Despite the inherent difficulty of obtaining the strongest bounds in general, a simple bound strengthening method is presented to strengthen the convex relaxation of the problem when there exists a connected spanning subnetwork of the system with fixed lines. The proposed method can be treated as a preprocessing step that is independent of the solver to be later used for numerical calculations and can be carried out offline before initiating the solver. A remarkable speedup in the runtime of the mixed-integer solvers is obtained using the proposed bound strengthening method for medium- and large-scale real-world systems. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Report number: BCOL Research Report 17.06, IEOR, University of California-Berkeley

arXiv:1710.03475 [pdf, other]

doi 10.1007/s10107-020-01516-y

Sparse Semidefinite Programs with Guaranteed Near-Linear Time Complexity via Dualized Clique Tree Conversion

Authors: Richard Y. Zhang, Javad Lavaei

Abstract: Clique tree conversion solves large-scale semidefinite programs by splitting an $n\times n$ matrix variable into up to $n$ smaller matrix variables, each representing a principal submatrix of up to $ω\timesω$. Its fundamental weakness is the need to introduce overlap constraints that enforce agreement between different matrix variables, because these can result in dense coupling. In this paper, we… ▽ More Clique tree conversion solves large-scale semidefinite programs by splitting an $n\times n$ matrix variable into up to $n$ smaller matrix variables, each representing a principal submatrix of up to $ω\timesω$. Its fundamental weakness is the need to introduce overlap constraints that enforce agreement between different matrix variables, because these can result in dense coupling. In this paper, we show that by dualizing the clique tree conversion, the coupling due to the overlap constraints is guaranteed to be sparse over dense blocks, with a block sparsity pattern that coincides with the adjacency matrix of a tree. We consider two classes of semidefinite programs with favorable sparsity patterns that encompass the MAXCUT and MAX $k$-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation. Assuming that $ω\ll n$, we prove that the per-iteration cost of an interior-point method is linear $O(n)$ time and memory, so an $ε$-accurate and $ε$-feasible iterate is obtained after $O(\sqrt{n}\log(1/ε))$ iterations in near-linear $O(n^{1.5}\log(1/ε))$ time. We confirm our theoretical insights with numerical results on semidefinite programs as large as $n=13659$. (Supporting code at https://github.com/ryz-codes/dual_ctc ) △ Less

Submitted 26 April, 2020; v1 submitted 10 October, 2017; originally announced October 2017.

Comments: [v1] appeared in IEEE CDC 2018; [v2+] To appear in Mathematical Programming

Journal ref: Mathematical Programming 2020

arXiv:1704.00133 [pdf, ps, other]

Conic Relaxations for Power System State Estimation with Line Measurements

Authors: Yu Zhang, Ramtin Madani, Javad Lavaei

Abstract: This paper deals with the non-convex power system state estimation (PSSE) problem, which plays a central role in the monitoring and operation of electric power networks. Given a set of noisy measurements, PSSE aims at estimating the vector of complex voltages at all buses of the network. This is a challenging task due to the inherent nonlinearity of power flows, for which existing methods lack gua… ▽ More This paper deals with the non-convex power system state estimation (PSSE) problem, which plays a central role in the monitoring and operation of electric power networks. Given a set of noisy measurements, PSSE aims at estimating the vector of complex voltages at all buses of the network. This is a challenging task due to the inherent nonlinearity of power flows, for which existing methods lack guaranteed convergence and theoretical analysis. Motivating by these limitations, we propose a novel convexification framework for the PSSE using semidefinite programming (SDP) and second-order cone programming (SOCP) relaxations. We first study a related power flow (PF) problem as the noiseless counterpart, which is cast as a constrained minimization program by adding a suitably designed objective function. We study the performance of the proposed framework in the case where the set of measurements includes: (i) nodal voltage magnitudes, and (ii) branch active power flows over a spanning tree of the network. It is shown that the SDP and SOCP relaxations both recover the true PF solution as long as the voltage angle difference across each line of the network is not too large (e.g., less than 90 degrees for lossless networks). By capitalizing on this result, penalized SDP and SOCP problems are designed to solve the PSSE, where a penalty based on the weighted least absolute value is incorporated for fitting noisy measurements with possible bad data. Strong theoretical results are derived to quantify the optimal solution of the penalized SDP problem, which is shown to possess a dominant rank-one component formed by lifting the true voltage vector. An upper bound on the estimation error is also derived as a function of the noise power, which decreases exponentially fast as the number of measurements increases. △ Less

Submitted 1 April, 2017; originally announced April 2017.

Comments: Technical report: 14 pages, 5 figures

arXiv:1703.10973 [pdf, other]

doi 10.1109/CDC.2017.8264510

Modified Interior-Point Method for Large-and-Sparse Low-Rank Semidefinite Programs

Authors: Richard Y. Zhang, Javad Lavaei

Abstract: Semidefinite programs (SDPs) are powerful theoretical tools that have been studied for over two decades, but their practical use remains limited due to computational difficulties in solving large-scale, realistic-sized problems. In this paper, we describe a modified interior-point method for the efficient solution of large-and-sparse low-rank SDPs, which finds applications in graph theory, approxi… ▽ More Semidefinite programs (SDPs) are powerful theoretical tools that have been studied for over two decades, but their practical use remains limited due to computational difficulties in solving large-scale, realistic-sized problems. In this paper, we describe a modified interior-point method for the efficient solution of large-and-sparse low-rank SDPs, which finds applications in graph theory, approximation theory, control theory, sum-of-squares, etc. Given that the problem data is large-and-sparse, conjugate gradients (CG) can be used to avoid forming, storing, and factoring the large and fully-dense interior-point Hessian matrix, but the resulting convergence rate is usually slow due to ill-conditioning. Our central insight is that, for a rank-$k$, size-$n$ SDP, the Hessian matrix is ill-conditioned only due to a rank-$nk$ perturbation, which can be explicitly computed using a size-$n$ eigendecomposition. We construct a preconditioner to "correct" the low-rank perturbation, thereby allowing preconditioned CG to solve the Hessian equation in a few tens of iterations. This modification is incorporated within SeDuMi, and used to reduce the solution time and memory requirements of large-scale matrix-completion problems by several orders of magnitude. △ Less

Submitted 5 September, 2017; v1 submitted 31 March, 2017; originally announced March 2017.

Comments: 8 pages, 2 figures

arXiv:1204.4419 [pdf, ps, other]

doi 10.1109/TPWRS.2013.2282086

Geometry of Power Flows and Optimization in Distribution Networks

Authors: Javad Lavaei, David Tse, Baosen Zhang

Abstract: We investigate the geometry of injection regions and its relationship to optimization of power flows in tree networks. The injection region is the set of all vectors of bus power injections that satisfy the network and operation constraints. The geometrical object of interest is the set of Pareto-optimal points of the injection region. If the voltage magnitudes are fixed, the injection region of a… ▽ More We investigate the geometry of injection regions and its relationship to optimization of power flows in tree networks. The injection region is the set of all vectors of bus power injections that satisfy the network and operation constraints. The geometrical object of interest is the set of Pareto-optimal points of the injection region. If the voltage magnitudes are fixed, the injection region of a tree network can be written as a linear transformation of the product of two-bus injection regions, one for each line in the network. Using this decomposition, we show that under the practical condition that the angle difference across each line is not too large, the set of Pareto-optimal points of the injection region remains unchanged by taking the convex hull. Moreover, the resulting convexified optimal power flow problem can be efficiently solved via }{ semi-definite programming or second order cone relaxations. These results improve upon earlier works by removing the assumptions on active power lower bounds. It is also shown that our practical angle assumption guarantees two other properties: (i) the uniqueness of the solution of the power flow problem, and (ii) the non-negativity of the locational marginal prices. Partial results are presented for the case when the voltage magnitudes are not fixed but can lie within certain bounds. △ Less

Submitted 19 August, 2013; v1 submitted 19 April, 2012; originally announced April 2012.

Comments: To Appear in IEEE Transaction on Power Systems

arXiv:1204.1106 [pdf, ps, other]

Message Passing for Dynamic Network Energy Management

Authors: Matt Kraning, Eric Chu, Javad Lavaei, Stephen Boyd

Abstract: We consider a network of devices, such as generators, fixed loads, deferrable loads, and storage devices, each with its own dynamic constraints and objective, connected by lossy capacitated lines. The problem is to minimize the total network objective subject to the device and line constraints, over a given time horizon. This is a large optimization problem, with variables for consumption or gener… ▽ More We consider a network of devices, such as generators, fixed loads, deferrable loads, and storage devices, each with its own dynamic constraints and objective, connected by lossy capacitated lines. The problem is to minimize the total network objective subject to the device and line constraints, over a given time horizon. This is a large optimization problem, with variables for consumption or generation in each time period for each device. In this paper we develop a decentralized method for solving this problem. The method is iterative: At each step, each device exchanges simple messages with its neighbors in the network and then solves its own optimization problem, minimizing its own objective function, augmented by a term determined by the messages it has received. We show that this message passing method converges to a solution when the device objective and constraints are convex. The method is completely decentralized, and needs no global coordination other than synchronizing iterations; the problems to be solved by each device can typically be solved extremely efficiently and in parallel. The method is fast enough that even a serial implementation can solve substantial problems in reasonable time frames. We report results for several numerical experiments, demonstrating the method's speed and scaling, including the solution of a problem instance with over 30 million variables in 52 minutes for a serial implementation; with decentralized computing, the solve time would be less than one second. △ Less

Submitted 4 April, 2012; originally announced April 2012.

Comments: Submitted to IEEE Transactions on Smart grid

Showing 1–46 of 46 results for author: Lavaei, J