-
Approximately Gaussian Replicator Flows: Nonconvex Optimization as a Nash-Convergent Evolutionary Game
Authors:
Brendon G. Anderson,
Samuel Pfrommer,
Somayeh Sojoudi
Abstract:
This work leverages tools from evolutionary game theory to solve unconstrained nonconvex optimization problems. Specifically, we lift such a problem to an optimization over probability measures, whose minimizers exactly correspond to the Nash equilibria of a particular population game. To algorithmically solve for such Nash equilibria, we introduce approximately Gaussian replicator flows (AGRFs) a…
▽ More
This work leverages tools from evolutionary game theory to solve unconstrained nonconvex optimization problems. Specifically, we lift such a problem to an optimization over probability measures, whose minimizers exactly correspond to the Nash equilibria of a particular population game. To algorithmically solve for such Nash equilibria, we introduce approximately Gaussian replicator flows (AGRFs) as a tractable alternative to simulating the corresponding infinite-dimensional replicator dynamics. Our proposed AGRF dynamics can be integrated using off-the-shelf ODE solvers when considering objectives with closed-form integrals against a Gaussian measure. We theoretically analyze AGRF dynamics by explicitly characterizing their trajectories and stability on quadratic objective functions, in addition to analyzing their descent properties. Our methods are supported by illustrative experiments on a range of canonical nonconvex optimization benchmark functions.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Absence of spurious solutions far from ground truth: A low-rank analysis with high-order losses
Authors:
Ziye Ma,
Ying Chen,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
Matrix sensing problems exhibit pervasive non-convexity, plaguing optimization with a proliferation of suboptimal spurious solutions. Avoiding convergence to these critical points poses a major challenge. This work provides new theoretical insights that help demystify the intricacies of the non-convex landscape. In this work, we prove that under certain conditions, critical points sufficiently dis…
▽ More
Matrix sensing problems exhibit pervasive non-convexity, plaguing optimization with a proliferation of suboptimal spurious solutions. Avoiding convergence to these critical points poses a major challenge. This work provides new theoretical insights that help demystify the intricacies of the non-convex landscape. In this work, we prove that under certain conditions, critical points sufficiently distant from the ground truth matrix exhibit favorable geometry by being strict saddle points rather than troublesome local minima. Moreover, we introduce the notion of higher-order losses for the matrix sensing problem and show that the incorporation of such losses into the objective function amplifies the negative curvature around those distant critical points. This implies that increasing the complexity of the objective function via high-order losses accelerates the escape from such critical points and acts as a desirable alternative to increasing the complexity of the optimization problem via over-parametrization. By elucidating key characteristics of the non-convex optimization landscape, this work makes progress towards a comprehensive framework for tackling broader machine learning objectives plagued by non-convexity.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
The computation of approximate feedback Stackelberg equilibria in multi-player nonlinear constrained dynamic games
Authors:
**gqi Li,
Somayeh Sojoudi,
Claire Tomlin,
David Fridovich-Keil
Abstract:
Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing local feedback Stackelberg policies in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are…
▽ More
Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing local feedback Stackelberg policies in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are primarily designed for unconstrained problems, our approach involves reformulating a feedback Stackelberg dynamic game into a sequence of nested optimization problems, enabling the derivation of Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order sufficient condition for local feedback Stackelberg policies. We propose a Newton-style primal-dual interior point method for solving constrained linear quadratic (LQ) feedback Stackelberg games, offering provable convergence guarantees. Our method is further extended to compute local feedback Stackelberg policies for more general nonlinear games by iteratively approximating them using LQ games, ensuring that their KKT conditions are locally aligned with those of the original nonlinear games. We prove the exponential convergence of our algorithm in constrained nonlinear games. In a feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs and constraints, our experimental results reveal the algorithm's ability to handle infeasible initial conditions and achieve exponential convergence towards an approximate local feedback Stackelberg equilibrium.
△ Less
Submitted 15 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Evolutionary Games on Infinite Strategy Sets: Convergence to Nash Equilibria via Dissipativity
Authors:
Brendon G. Anderson,
Somayeh Sojoudi,
Murat Arcak
Abstract:
We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient c…
▽ More
We consider evolutionary dynamics for population games in which players have a continuum of strategies at their disposal. Models in this setting amount to infinite-dimensional differential equations evolving on the manifold of probability measures. We generalize dissipativity theory for evolutionary games from finite to infinite strategy sets that are compact metric spaces, and derive sufficient conditions for the stability of Nash equilibria under the infinite-dimensional dynamics. The resulting analysis is applicable to a broad class of evolutionary games, and is modular in the sense that the pertinent conditions on the dynamics and the game's payoff structure can be verified independently. By specializing our theory to the class of monotone games, we recover as special cases existing stability results for the Brown-von Neumann-Nash and impartial pairwise comparison dynamics. We also extend our theory to models with dynamic payoffs, further broadening the applicability of our framework. We illustrate our theory using a variety of case studies, including a novel, continuous variant of the war of attrition game.
△ Less
Submitted 22 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing
Authors:
Ziye Ma,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matri…
▽ More
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Tight Certified Robustness via Min-Max Representations of ReLU Neural Networks
Authors:
Brendon G. Anderson,
Samuel Pfrommer,
Somayeh Sojoudi
Abstract:
The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by develo** a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probab…
▽ More
The reliable deployment of neural networks in control systems requires rigorous robustness guarantees. In this paper, we obtain tight robustness certificates over convex attack sets for min-max representations of ReLU neural networks by develo** a convex reformulation of the nonconvex certification problem. This is done by "lifting" the problem to an infinite-dimensional optimization over probability measures, leveraging recent results in distributionally robust optimization to solve for an optimal discrete distribution, and proving that solutions of the original nonconvex problem are generated by the discrete distribution under mild boundedness, nonredundancy, and Slater conditions. As a consequence, optimal (worst-case) attacks against the model may be solved for exactly. This contrasts prior state-of-the-art that either requires expensive branch-and-bound schemes or loose relaxation techniques. Experiments on robust control and MNIST image classification examples highlight the benefits of our approach.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization
Authors:
Tanmay Gautam,
Reid Pryzant,
Ziyi Yang,
Chenguang Zhu,
Somayeh Sojoudi
Abstract:
Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces…
▽ More
Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces practical challenges including codebook collapse, non-differentiability and lossy compression. To mitigate the aforementioned issues, we propose Soft Convex Quantization (SCQ) as a direct substitute for VQ. SCQ works like a differentiable convex optimization (DCO) layer: in the forward pass, we solve for the optimal convex combination of codebook vectors that quantize the inputs. In the backward pass, we leverage differentiability through the optimality conditions of the forward solution. We then introduce a scalable relaxation of the SCQ optimization and demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets. We train powerful SCQ autoencoder models that significantly outperform matched VQ-based architectures, observing an order of magnitude better image reconstruction and codebook usage with comparable quantization runtime.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Meta-Learning Parameterized First-Order Optimizers using Differentiable Convex Optimization
Authors:
Tanmay Gautam,
Samuel Pfrommer,
Somayeh Sojoudi
Abstract:
Conventional optimization methods in machine learning and controls rely heavily on first-order update rules. Selecting the right method and hyperparameters for a particular task often involves trial-and-error or practitioner intuition, motivating the field of meta-learning. We generalize a broad family of preexisting update rules by proposing a meta-learning framework in which the inner loop optim…
▽ More
Conventional optimization methods in machine learning and controls rely heavily on first-order update rules. Selecting the right method and hyperparameters for a particular task often involves trial-and-error or practitioner intuition, motivating the field of meta-learning. We generalize a broad family of preexisting update rules by proposing a meta-learning framework in which the inner loop optimization step involves solving a differentiable convex optimization (DCO). We illustrate the theoretical appeal of this approach by showing that it enables one-step optimization of a family of linear least squares problems, given that the meta-learner has sufficient exposure to similar tasks. Various instantiations of the DCO update rule are compared to conventional optimizers on a range of illustrative experimental settings.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points
Authors:
Ziye Ma,
Igor Molybog,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
This paper studies the role of over-parametrization in solving non-convex optimization problems. The focus is on the important class of low-rank matrix sensing, where we propose an infinite hierarchy of non-convex problems via the lifting technique and the Burer-Monteiro factorization. This contrasts with the existing over-parametrization technique where the search rank is limited by the dimension…
▽ More
This paper studies the role of over-parametrization in solving non-convex optimization problems. The focus is on the important class of low-rank matrix sensing, where we propose an infinite hierarchy of non-convex problems via the lifting technique and the Burer-Monteiro factorization. This contrasts with the existing over-parametrization technique where the search rank is limited by the dimension of the matrix and it does not allow a rich over-parametrization of an arbitrary degree. We show that although the spurious solutions of the problem remain stationary points through the hierarchy, they will be transformed into strict saddle points (under some technical conditions) and can be escaped via local search methods. This is the first result in the literature showing that over-parametrization creates a negative curvature for esca** spurious solutions. We also derive a bound on how much over-parametrization is requited to enable the elimination of spurious solutions.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Semidefinite Programming versus Burer-Monteiro Factorization for Matrix Sensing
Authors:
Baturalp Yalcin,
Ziye Ma,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complex…
▽ More
Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complexities, whereas the B-M method may return a spurious solution due to the non-convexity of the problem. The existing theoretical guarantees for the success of these methods have led to similar conservative conditions, which may wrongly imply that these methods have comparable performances. In this paper, we shed light on some major differences between these two methods. First, we present a class of structured matrix completion problems for which the B-M methods fail with an overwhelming probability, while the SDP method works correctly. Second, we identify a class of highly sparse matrix completion problems for which the B-M method works and the SDP method fails. Third, we prove that although the B-M method exhibits the same performance independent of the rank of the unknown solution, the success of the SDP method is correlated to the rank of the solution and improves as the rank increases. Unlike the existing literature that has mainly focused on those instances of matrix sensing for which both SDP and B-M work, this paper offers the first result on the unique merit of each method over the alternative approach.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models
Authors:
Brendon G. Anderson,
Tanmay Gautam,
Somayeh Sojoudi
Abstract:
In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular a…
▽ More
In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.
△ Less
Submitted 27 September, 2022; v1 submitted 15 August, 2022;
originally announced August 2022.
-
A New Complexity Metric for Nonconvex Rank-one Generalized Matrix Completion
Authors:
Haixiang Zhang,
Baturalp Yalcin,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
In this work, we develop a new complexity metric for an important class of low-rank matrix optimization problems in both symmetric and asymmetric cases, where the metric aims to quantify the complexity of the nonconvex optimization landscape of each problem and the success of local search methods in solving the problem. The existing literature has focused on two complexity bounds. The RIP constant…
▽ More
In this work, we develop a new complexity metric for an important class of low-rank matrix optimization problems in both symmetric and asymmetric cases, where the metric aims to quantify the complexity of the nonconvex optimization landscape of each problem and the success of local search methods in solving the problem. The existing literature has focused on two complexity bounds. The RIP constant is commonly used to characterize the complexity of matrix sensing problems. On the other hand, the incoherence and the sampling rate are used when analyzing matrix completion problems. The proposed complexity metric has the potential to generalize these two notions and also applies to a much larger class of problems. To mathematically study the properties of this metric, we focus on the rank-$1$ generalized matrix completion problem and illustrate the usefulness of the new complexity metric on three types of instances, namely, instances with the RIP condition, instances obeying the Bernoulli sampling model, and a synthetic example. We show that the complexity metric exhibits a consistent behavior in the three cases, even when other existing conditions fail to provide theoretical guarantees. These observations provide a strong implication that the new complexity metric has the potential to generalize various conditions of optimization complexity proposed for different applications. Furthermore, we establish theoretical results to provide sufficient and necessary conditions for the existence of spurious solutions in terms of the proposed complexity metric. This contrasts with the RIP and incoherence conditions that fail to provide any necessary condition.
△ Less
Submitted 21 July, 2023; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning
Authors:
**gqi Li,
Donggun Lee,
Somayeh Sojoudi,
Claire J. Tomlin
Abstract:
In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a con…
▽ More
In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Noisy Low-rank Matrix Optimization: Geometry of Local Minima and Convergence Rate
Authors:
Ziye Ma,
Somayeh Sojoudi
Abstract:
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the exis…
▽ More
This paper is concerned with low-rank matrix optimization, which has found a wide range of applications in machine learning. This problem in the special case of matrix sensing has been studied extensively through the notion of Restricted Isometry Property (RIP), leading to a wealth of results on the geometric landscape of the problem and the convergence rate of common algorithms. However, the existing results can handle the problem in the case with a general objective function subject to noisy data only when the RIP constant is close to 0. In this paper, we develop a new mathematical framework to solve the above-mentioned problem with a far less restrictive RIP constant. We prove that as long as the RIP constant of the noiseless objective is less than $1/3$, any spurious local solution of the noisy optimization problem must be close to the ground truth solution. By working through the strict saddle property, we also show that an approximate solution can be found in polynomial time. We characterize the geometry of the spurious local minima of the problem in a local region around the ground truth in the case when the RIP constant is greater than $1/3$. Compared to the existing results in the literature, this paper offers the strongest RIP bound and provides a complete theoretical analysis on the global and local optimization landscapes of general low-rank optimization problems under random corruptions from any finite-variance family.
△ Less
Submitted 15 March, 2023; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods
Authors:
Baturalp Yalcin,
Haixiang Zhang,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work,…
▽ More
It is well-known that the Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the RIP condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with a low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. In this work, we provide a negative answer to the above question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. Based on those results, we define a new complexity metric that potentially measures the solvability of low-rank matrix optimization problems based on the B-M factorization approach. In addition, we show that more measurements of the ground truth matrix can deteriorate the landscape, which further reveals the unfavorable behavior of the B-M factorization on general low-rank matrix optimization problems.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Sharp Restricted Isometry Property Bounds for Low-rank Matrix Recovery Problems with Corrupted Measurements
Authors:
Ziye Ma,
Yingjie Bi,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum d…
▽ More
In this paper, we study a general low-rank matrix recovery problem with linear measurements corrupted by some noise. The objective is to understand under what conditions on the restricted isometry property (RIP) of the problem local search methods can find the ground truth with a small error. By analyzing the landscape of the non-convex problem, we first propose a global guarantee on the maximum distance between an arbitrary local minimizer and the ground truth under the assumption that the RIP constant is smaller than $1/2$. We show that this distance shrinks to zero as the intensity of the noise reduces. Our new guarantee is sharp in terms of the RIP constant and is much stronger than the existing results. We then present a local guarantee for problems with an arbitrary RIP constant, which states that any local minimizer is either considerably close to the ground truth or far away from it. Next, we prove the strict saddle property, which guarantees the global convergence of the perturbed gradient descent method in polynomial time. The developed results demonstrate how the noise intensity and the RIP constant of the problem affect the landscape of the problem.
△ Less
Submitted 25 July, 2023; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Internally Hankel $k$-positive systems
Authors:
Christian Grussler,
Thiago B. Burghi,
Somayeh Sojoudi
Abstract:
The classes of externally Hankel $k$-positive LTI systems and autonomous $k$-positive systems have recently been defined, and their properties and applications began to be explored using the framework of total positivity and variation diminishing operators. In this work, these two system classes are subsumed under a new class of internally Hankel $k$-positive systems, which we define as state-spac…
▽ More
The classes of externally Hankel $k$-positive LTI systems and autonomous $k$-positive systems have recently been defined, and their properties and applications began to be explored using the framework of total positivity and variation diminishing operators. In this work, these two system classes are subsumed under a new class of internally Hankel $k$-positive systems, which we define as state-space LTI systems with $k$-positive controllability and observability operators. We show that internal Hankel $k$-positivity is a natural extension of the celebrated property of internal positivity ($k=1$), and we derive tractable conditions for verifying the cases $k> 1$ in the form of internal positivity of the first $k$ compound systems. As these conditions define a new positive realization problem, we also discuss geometric conditions for when a minimal internally Hankel $k$-positive realization exists. Finally, we use our results to establish a new framework for bounding the number of over- and undershoots in the step response of general LTI systems.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Towards Optimal Branching of Linear and Semidefinite Relaxations for Neural Network Robustness Certification
Authors:
Brendon G. Anderson,
Ziye Ma,
**gqi Li,
Somayeh Sojoudi
Abstract:
In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show…
▽ More
In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error suffered by the popular linear programming (LP) and semidefinite programming (SDP) certification methods, we take a branch-and-bound approach to propose partitioning the input uncertainty set and solving the relaxations on each part separately. We show that this approach reduces relaxation error, and that the error is eliminated entirely upon performing an LP relaxation with a partition intelligently designed to exploit the nature of the ReLU activations. To scale this approach to large networks, we consider using a coarser partition whereby the number of parts in the partition is reduced. We prove that computing such a coarse partition that directly minimizes the LP relaxation error is NP-hard. By instead minimizing the worst-case LP relaxation error, we develop a closed-form branching scheme. We extend the analysis to the SDP, where the feasible set geometry is exploited to design a branching scheme that minimizes the worst-case SDP relaxation error. Experiments on MNIST, CIFAR-10, and Wisconsin breast cancer diagnosis classifiers demonstrate significant increases in the percentages of test samples certified. By independently increasing the input size and the number of layers, we empirically illustrate under which regimes the branched LP and branched SDP are best applied.
△ Less
Submitted 2 February, 2023; v1 submitted 22 January, 2021;
originally announced January 2021.
-
A Sequential Framework Towards an Exact SDP Verification of Neural Networks
Authors:
Ziye Ma,
Somayeh Sojoudi
Abstract:
Although neural networks have been applied to several systems in recent years, they still cannot be used in safety-critical systems due to the lack of efficient techniques to certify their robustness. A number of techniques based on convex optimization have been proposed in the literature to study the robustness of neural networks, and the semidefinite programming (SDP) approach has emerged as a l…
▽ More
Although neural networks have been applied to several systems in recent years, they still cannot be used in safety-critical systems due to the lack of efficient techniques to certify their robustness. A number of techniques based on convex optimization have been proposed in the literature to study the robustness of neural networks, and the semidefinite programming (SDP) approach has emerged as a leading contender for the robust certification of neural networks. The major challenge to the SDP approach is that it is prone to a large relaxation gap. In this work, we address this issue by develo** a sequential framework to shrink this gap to zero by adding non-convex cuts to the optimization problem via disjunctive programming. We analyze the performance of this sequential SDP method both theoretically and empirically, and show that it bridges the gap as the number of cuts increases.
△ Less
Submitted 27 September, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Certifying Neural Network Robustness to Random Input Noise from Samples
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the p…
▽ More
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial input uncertainty, but researchers have recently shown a need for methods that consider random uncertainty. In this paper, we propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to replace the optimization constraints. The resulting optimization reduces to a linear program with an analytical solution. Furthermore, we develop a sufficient condition on the number of samples needed to make the misclassification bound hold with overwhelming probability. Our case studies on MNIST classifiers show that this method is able to certify a uniform infinity-norm uncertainty region with a radius of nearly 50 times larger than what the current state-of-the-art method can certify.
△ Less
Submitted 25 January, 2023; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Data-Driven Certification of Neural Networks with Random Input Noise
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming…
▽ More
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. Most certification methods in the literature are designed for adversarial or worst-case inputs, but researchers have recently shown a need for methods that consider random input noise. In this paper, we examine the setting where inputs are subject to random noise coming from an arbitrary probability distribution. We propose a robustness certification method that lower-bounds the probability that network outputs are safe. This bound is cast as a chance-constrained optimization problem, which is then reformulated using input-output samples to make the optimization constraints tractable. We develop sufficient conditions for the resulting optimization to be convex, as well as on the number of samples needed to make the robustness bound hold with overwhelming probability. We show for a special case that the proposed optimization reduces to an intuitive closed-form solution. Case studies on synthetic, MNIST, and CIFAR-10 networks experimentally demonstrate that this method is able to certify robustness against various input noise regimes over larger uncertainty regions than prior state-of-the-art techniques.
△ Less
Submitted 25 January, 2023; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Tightened Convex Relaxations for Neural Network Robustness Certification
Authors:
Brendon G. Anderson,
Ziye Ma,
**gqi Li,
Somayeh Sojoudi
Abstract:
In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. O…
▽ More
In this paper, we consider the problem of certifying the robustness of neural networks to perturbed and adversarial input data. Such certification is imperative for the application of neural networks in safety-critical decision-making and control systems. Certification techniques using convex optimization have been proposed, but they often suffer from relaxation errors that void the certificate. Our work exploits the structure of ReLU networks to improve relaxation errors through a novel partition-based certification procedure. The proposed method is proven to tighten existing linear programming relaxations, and asymptotically achieves zero relaxation error as the partition is made finer. We develop a finite partition that attains zero relaxation error and use the result to derive a tractable partitioning scheme that minimizes the worst-case relaxation error. Experiments using real data show that the partitioning procedure is able to issue robustness certificates in cases where prior methods fail. Consequently, partition-based certification procedures are found to provide an intuitive, effective, and theoretically justified method for tightening existing convex relaxation techniques.
△ Less
Submitted 17 September, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Efficient Learning of Distributed Linear-Quadratic Controllers
Authors:
Salar Fattahi,
Nikolai Matni,
Somayeh Sojoudi
Abstract:
In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as applied to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near…
▽ More
In this work, we propose a robust approach to design distributed controllers for unknown-but-sparse linear and time-invariant systems. By leveraging modern techniques in distributed controller synthesis and structured linear inverse problems as applied to system identification, we show that near-optimal distributed controllers can be learned with sub-linear sample complexity and computed with near-linear time complexity, both measured with respect to the dimension of the system. In particular, we provide sharp end-to-end guarantees on the stability and the performance of the designed distributed controller and prove that for sparse systems, the number of samples needed to guarantee robust and near optimal performance of the designed controller can be significantly smaller than the dimension of the system. Finally, we show that the proposed optimization problem can be solved to global optimality with near-linear time complexity by iteratively solving a series of small quadratic programs.
△ Less
Submitted 10 October, 2019; v1 submitted 21 September, 2019;
originally announced September 2019.
-
Global Optimality Guarantees for Nonconvex Unsupervised Video Segmentation
Authors:
Brendon G. Anderson,
Somayeh Sojoudi
Abstract:
In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its com…
▽ More
In this paper, we consider the problem of unsupervised video object segmentation via background subtraction. Specifically, we pose the nonsemantic extraction of a video's moving objects as a nonconvex optimization problem via a sum of sparse and low-rank matrices. The resulting formulation, a nonnegative variant of robust principal component analysis, is more computationally tractable than its commonly employed convex relaxation, although not generally solvable to global optimality. In spite of this limitation, we derive intuitive and interpretable conditions on the video data under which the uniqueness and global optimality of the object segmentation are guaranteed using local search methods. We illustrate these novel optimality criteria through example segmentations using real video data.
△ Less
Submitted 22 February, 2020; v1 submitted 9 July, 2019;
originally announced July 2019.
-
On the Absence of Spurious Local Trajectories in Time-varying Nonconvex Optimization
Authors:
S. Fattahi,
C. Josz,
Y. Ding,
R. Mohammadi,
J. Lavaei,
S. Sojoudi
Abstract:
In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution…
▽ More
In this paper, we study the landscape of an online nonconvex optimization problem, for which the input data vary over time and the solution is a trajectory rather than a single point. To understand the complexity of finding a global solution of this problem, we introduce the notion of \textit{spurious (i.e., non-global) local trajectory} as a generalization to the notion of spurious local solution in nonconvex (time-invariant) optimization. We develop an ordinary differential equation (ODE) associated with a time-varying nonlinear dynamical system which, at limit, characterizes the spurious local solutions of the time-varying optimization problem. We prove that the absence of spurious local trajectory is closely related to the transient behavior of the developed system. In particular, we show that if the problem is time-varying, the data variation may force all of the ODE trajectories initialized at arbitrary local minima at the initial time to gradually converge to the global solution trajectory. We study the Jacobian of the dynamical system along a local minimum trajectory and show how its eigenvalues are manipulated by the natural data variation in the problem, which may consequently trigger esca** poor local minima over time.
△ Less
Submitted 30 October, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Learning Sparse Dynamical Systems from a Single Sample Trajectory
Authors:
Salar Fattahi,
Nikolai Matni,
Somayeh Sojoudi
Abstract:
This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guar…
▽ More
This paper addresses the problem of identifying sparse linear time-invariant (LTI) systems from a single sample trajectory generated by the system dynamics. We introduce a Lasso-like estimator for the parameters of the system, taking into account their sparse nature. Assuming that the system is stable, or that it is equipped with an initial stabilizing controller, we provide sharp finite-time guarantees on the accurate recovery of both the sparsity structure and the parameter values of the system. In particular, we show that the proposed estimator can correctly identify the sparsity pattern of the system matrices with high probability, provided that the length of the sample trajectory exceeds a threshold. Furthermore, we show that this threshold scales polynomially in the number of nonzero elements in the system matrices, but logarithmically in the system dimensions --- this improves on existing sample complexity bounds for the sparse system identification problem. We further extend these results to obtain sharp bounds on the $\ell_{\infty}$-norm of the estimation error and show how different properties of the system---such as its stability level and \textit{mutual incoherency}---affect this bound. Finally, an extensive case study on power systems is presented to illustrate the performance of the proposed estimation method.
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery
Authors:
Richard Y. Zhang,
Somayeh Sojoudi,
Javad Lavaei
Abstract:
Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $δ$. If $δ$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $δ$ to guarantee the inexistence of spur…
▽ More
Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $δ$. If $δ$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $δ$ to guarantee the inexistence of spurious local minima. Using the technique, we prove that in the case of a rank-1 ground truth, an RIP constant of $δ<1/2$ is both necessary and sufficient for exact recovery from any arbitrary initial point (such as a random point). We also prove a local recovery result: given an initial point $x_{0}$ satisfying $f(x_{0})\le(1-δ)^{2}f(0)$, any descent algorithm that converges to second-order optimality guarantees exact recovery.
△ Less
Submitted 13 June, 2019; v1 submitted 6 January, 2019;
originally announced January 2019.
-
Transient Stability Analysis of Power Systems via Occupation Measures
Authors:
Cedric Josz,
Daniel K. Molzahn,
Matteo Tacchi,
Somayeh Sojoudi
Abstract:
We propose the application of occupation measure theory to the classical problem of transient stability analysis for power systems. This enables the computation of certified inner and outer approximations for the region of attraction of a nominal operating point. In order to determine whether a post-disturbance point requires corrective actions to ensure stability, one would then simply need to ch…
▽ More
We propose the application of occupation measure theory to the classical problem of transient stability analysis for power systems. This enables the computation of certified inner and outer approximations for the region of attraction of a nominal operating point. In order to determine whether a post-disturbance point requires corrective actions to ensure stability, one would then simply need to check the sign of a polynomial evaluated at that point. Thus, computationally expensive dynamical simulations are only required for post-disturbance points in the region between the inner and outer approximations. We focus on the nonlinear swing equations but voltage dynamics could also be included. The proposed approach is formulated as a hierarchy of semidefinite programs stemming from an infinite-dimensional linear program in a measure space, with a natural dual sum-of-squares perspective. On the theoretical side, this paper lays the groundwork for exploiting the oscillatory structure of power systems by using Hermitian (instead of real) sums-of-squares and connects the proposed approach to recent results from algebraic geometry.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
Authors:
Richard Y. Zhang,
Cédric Josz,
Somayeh Sojoudi,
Javad Lavaei
Abstract:
When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RI…
▽ More
When the linear measurements of an instance of low-rank matrix recovery satisfy a restricted isometry property (RIP)---i.e. they are approximately norm-preserving---the problem is known to contain no spurious local minima, so exact recovery is guaranteed. In this paper, we show that moderate RIP is not enough to eliminate spurious local minima, so existing results can only hold for near-perfect RIP. In fact, counterexamples are ubiquitous: we prove that every x is the spurious local minimum of a rank-1 instance of matrix recovery that satisfies RIP. One specific counterexample has RIP constant $δ=1/2$, but causes randomly initialized stochastic gradient descent (SGD) to fail 12% of the time. SGD is frequently able to avoid and escape spurious local minima, but this empirical result shows that it can occasionally be defeated by their existence. Hence, while exact recovery guarantees will likely require a proof of no spurious local minima, arguments based solely on norm preservation will only be applicable to a narrow set of nearly-isotropic instances.
△ Less
Submitted 30 October, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
Authors:
Cedric Josz,
Yi Ouyang,
Richard Y. Zhang,
Javad Lavaei,
Somayeh Sojoudi
Abstract:
We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Gl…
▽ More
We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term \textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization.
△ Less
Submitted 31 October, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Large-Scale Sparse Inverse Covariance Estimation via Thresholding and Max-Det Matrix Completion
Authors:
Richard Y. Zhang,
Salar Fattahi,
Somayeh Sojoudi
Abstract:
The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and sol…
▽ More
The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $ε$-accurate solution in $O(n\log(1/ε))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.
△ Less
Submitted 6 June, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Conic Optimization Theory: Convexification Techniques and Numerical Algorithms
Authors:
Richard Y. Zhang,
Cédric Josz,
Somayeh Sojoudi
Abstract:
Optimization is at the core of control theory and appears in several areas of this field, such as optimal control, distributed control, system identification, robust control, state estimation, model predictive control and dynamic programming. The recent advances in various topics of modern optimization have also been revam** the area of machine learning. Motivated by the crucial role of optimiza…
▽ More
Optimization is at the core of control theory and appears in several areas of this field, such as optimal control, distributed control, system identification, robust control, state estimation, model predictive control and dynamic programming. The recent advances in various topics of modern optimization have also been revam** the area of machine learning. Motivated by the crucial role of optimization theory in the design, analysis, control and operation of real-world systems, this tutorial paper offers a detailed overview of some major advances in this area, namely conic optimization and its emerging applications. First, we discuss the importance of conic optimization in different areas. Then, we explain seminal results on the design of hierarchies of convex relaxations for a wide range of nonconvex problems. Finally, we study different numerical algorithms for large-scale conic optimization problems.
△ Less
Submitted 26 September, 2017;
originally announced September 2017.