-
Quantum Algorithms and Lower Bounds for Finite-Sum Optimization
Authors:
Yexin Zhang,
Chenyi Zhang,
Cong Fang,
Liwei Wang,
Tongyang Li
Abstract:
Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-stro…
▽ More
Finite-sum optimization has wide applications in machine learning, covering important problems such as support vector machines, regression, etc. In this paper, we initiate the study of solving finite-sum optimization problems by quantum computing. Specifically, let $f_1,\ldots,f_n\colon\mathbb{R}^d\to\mathbb{R}$ be $\ell$-smooth convex functions and $ψ\colon\mathbb{R}^d\to\mathbb{R}$ be a $μ$-strongly convex proximal function. The goal is to find an $ε$-optimal point for $F(\mathbf{x})=\frac{1}{n}\sum_{i=1}^n f_i(\mathbf{x})+ψ(\mathbf{x})$. We give a quantum algorithm with complexity $\tilde{O}\big(n+\sqrt{d}+\sqrt{\ell/μ}\big(n^{1/3}d^{1/3}+n^{-2/3}d^{5/6}\big)\big)$, improving the classical tight bound $\tildeΘ\big(n+\sqrt{n\ell/μ}\big)$. We also prove a quantum lower bound $\tildeΩ(n+n^{3/4}(\ell/μ)^{1/4})$ when $d$ is large enough. Both our quantum upper and lower bounds can extend to the cases where $ψ$ is not necessarily strongly convex, or each $f_i$ is Lipschitz but not necessarily smooth. In addition, when $F$ is nonconvex, our quantum algorithm can find an $ε$-critial point using $\tilde{O}(n+\ell(d^{1/3}n^{1/3}+\sqrt{d})/ε^2)$ queries.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning
Authors:
Yihong Gu,
Cong Fang,
Peter Bühlmann,
Jianqing Fan
Abstract:
Pursuing causality from data is a fundamental problem in scientific discovery, treatment intervention, and transfer learning. This paper introduces a novel algorithmic method for addressing nonparametric invariance and causality learning in regression models across multiple environments, where the joint distribution of response variables and covariates varies, but the conditional expectations of o…
▽ More
Pursuing causality from data is a fundamental problem in scientific discovery, treatment intervention, and transfer learning. This paper introduces a novel algorithmic method for addressing nonparametric invariance and causality learning in regression models across multiple environments, where the joint distribution of response variables and covariates varies, but the conditional expectations of outcome given an unknown set of quasi-causal variables are invariant. The challenge of finding such an unknown set of quasi-causal or invariant variables is compounded by the presence of endogenous variables that have heterogeneous effects across different environments, including even one of them in the regression would make the estimation inconsistent. The proposed Focused Adversial Invariant Regularization (FAIR) framework utilizes an innovative minimax optimization approach that breaks down the barriers, driving regression models toward prediction-invariant solutions through adversarial testing. Leveraging the representation power of neural networks, FAIR neural networks (FAIR-NN) are introduced for causality pursuit. It is shown that FAIR-NN can find the invariant variables and quasi-causal variables under a minimal identification condition and that the resulting procedure is adaptive to low-dimensional composition structures in a non-asymptotic analysis. Under a structural causal model, variables identified by FAIR-NN represent pragmatic causality and provably align with exact causal mechanisms under conditions of sufficient heterogeneity. Computationally, FAIR-NN employs a novel Gumbel approximation with decreased temperature and stochastic gradient descent ascent algorithm. The procedures are convincingly demonstrated using simulated and real-data examples.
△ Less
Submitted 30 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Closed Form for Half-Area Overlap Offset of 2 Unit Disks
Authors:
Max Chicky Fang
Abstract:
The separation between the centers of two unit circles such that their overlap** area is exactly half of each's area is known to be around $0.8079455\dots$ (OEIS A133741). However, no closed form of this number is known. Here, we determine its closed form representation in terms of the inverse regularized beta function.
The separation between the centers of two unit circles such that their overlap** area is exactly half of each's area is known to be around $0.8079455\dots$ (OEIS A133741). However, no closed form of this number is known. Here, we determine its closed form representation in terms of the inverse regularized beta function.
△ Less
Submitted 15 January, 2024;
originally announced March 2024.
-
The Implicit Bias of Heterogeneity towards Invariance and Causality
Authors:
Yang Xu,
Yihong Gu,
Cong Fang
Abstract:
It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that ``association is not causation'' and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the d…
▽ More
It is observed empirically that the large language models (LLM), trained with a variant of regression loss using numerous corpus from the Internet, can unveil causal associations to some extent. This is contrary to the traditional wisdom that ``association is not causation'' and the paradigm of traditional causal inference in which prior causal knowledge should be carefully incorporated into the design of methods. It is a mystery why causality, in a higher layer of understanding, can emerge from the regression task that pursues associations. In this paper, we claim the emergence of causality from association-oriented training can be attributed to the coupling effects from the heterogeneity of the source data, stochasticity of training algorithms, and over-parameterization of the learning models. We illustrate such an intuition using a simple but insightful model that learns invariance, a quasi-causality, using regression loss. To be specific, we consider multi-environment low-rank matrix sensing problems where the unknown r-rank ground-truth d*d matrices diverge across the environments but contain a lower-rank invariant, causal part. In this case, running pooled gradient descent will result in biased solutions that only learn associations in general. We show that running large-batch Stochastic Gradient Descent, whose each batch being linear measurement samples randomly selected from a certain environment, can successfully drive the solution towards the invariant, causal solution under certain conditions. This step is related to the relatively strong heterogeneity of the environments, the large step size and noises in the optimization algorithm, and the over-parameterization of the model. In summary, we unveil another implicit bias that is a result of the symbiosis between the heterogeneity of data and modern algorithms, which is, to the best of our knowledge, first in the literature.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Heuristic Learning for Co-Design Scheme of Optimal Sequential Attack
Authors:
Xiaoyu Luo,
Haoxuan Pan,
Chongrong Fang,
Chengcheng Zhao,
Peng Cheng,
Jian** He
Abstract:
This paper considers a novel co-design problem of the optimal \textit{sequential} attack, whose attack strategy changes with the time series, and in which the \textit{sequential} attack selection strategy and \textit{sequential} attack signal are simultaneously designed. Different from the existing attack design works that separately focus on attack subsets or attack signals, the joint design of t…
▽ More
This paper considers a novel co-design problem of the optimal \textit{sequential} attack, whose attack strategy changes with the time series, and in which the \textit{sequential} attack selection strategy and \textit{sequential} attack signal are simultaneously designed. Different from the existing attack design works that separately focus on attack subsets or attack signals, the joint design of the attack strategy poses a huge challenge due to the deep coupling relation between the \textit{sequential} attack selection strategy and \textit{sequential} attack signal. In this manuscript, we decompose the sequential co-design problem into two equivalent sub-problems. Specifically, we first derive an analytical closed-form expression between the optimal attack signal and the sequential attack selection strategy. Furthermore, we prove the finite-time inverse convergence of the critical parameters in the injected optimal attack signal by discrete-time Lyapunov analysis, which enables the efficient off-line design of the attack signal and saves computing resources. Finally, we exploit its relationship to design a heuristic two-stage learning-based joint attack algorithm (HTL-JA), which can accelerate realization of the attack target compared to the one-stage proximal-policy-optimization-based (PPO) algorithm. Extensive simulations are conducted to show the effectiveness of the injected optimal sequential attack.
△ Less
Submitted 16 November, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Zeroth-order Optimization with Weak Dimension Dependency
Authors:
Pengyun Yue,
Long Yang,
Cong Fang,
Zhouchen Lin
Abstract:
Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning. However, in theory, most complexity results assert a linear dependency on the dimension of optimization variable, which implies paralyzations of zeroth-order algorithms for high-dimensional problems and cannot explain…
▽ More
Zeroth-order optimization is a fundamental research topic that has been a focus of various learning tasks, such as black-box adversarial attacks, bandits, and reinforcement learning. However, in theory, most complexity results assert a linear dependency on the dimension of optimization variable, which implies paralyzations of zeroth-order algorithms for high-dimensional problems and cannot explain their effectiveness in practice. In this paper, we present a novel zeroth-order optimization theory characterized by complexities that exhibit weak dependencies on dimensionality. The key contribution lies in the introduction of a new factor, denoted as $\mathrm{ED}_α=\sup_{x\in \mathbb{R}^d}\sum_{i=1}^dσ_i^α(\nabla^2 f(x))$ ($α>0$, $σ_i(\cdot)$ is the $i$-th singular value in non-increasing order), which effectively functions as a measure of dimensionality. The algorithms we propose demonstrate significantly reduced complexities when measured in terms of the factor $\mathrm{ED}_α$. Specifically, we first study a well-known zeroth-order algorithm from Nesterov and Spokoiny (2017) on quadratic objectives and show a complexity of $\mathcal{O}\left(\frac{\mathrm{ED}_1}{σ_d}\log(1/ε)\right)$ for the strongly convex setting. Furthermore, we introduce novel algorithms that leverages the Heavy-ball mechanism. Our proposed algorithm exhibits a complexity of $\mathcal{O}\left(\frac{\mathrm{ED}_{1/2}}{\sqrt{σ_d}}\cdot\log{\frac{L}μ}\cdot\log(1/ε)\right)$. We further expand the scope of the method to encompass generic smooth optimization problems under an additional Hessian-smooth condition. The resultant algorithms demonstrate remarkable complexities which improve by an order in $d$ under appropriate conditions. Our analysis lays the foundation for zeroth-order optimization methods for smooth functions within high-dimensional settings.
△ Less
Submitted 2 August, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Reservoir Computing with Error Correction: Long-term Behaviors of Stochastic Dynamical Systems
Authors:
Cheng Fang,
Yubin Lu,
Ting Gao,
**qiao Duan
Abstract:
The prediction of stochastic dynamical systems and the capture of dynamical behaviors are profound problems. In this article, we propose a data-driven framework combining Reservoir Computing and Normalizing Flow to study this issue, which mimics error modeling to improve traditional Reservoir Computing performance and integrates the virtues of both approaches. With few assumptions about the underl…
▽ More
The prediction of stochastic dynamical systems and the capture of dynamical behaviors are profound problems. In this article, we propose a data-driven framework combining Reservoir Computing and Normalizing Flow to study this issue, which mimics error modeling to improve traditional Reservoir Computing performance and integrates the virtues of both approaches. With few assumptions about the underlying stochastic dynamical systems, this model-free method successfully predicts the long-term evolution of stochastic dynamical systems and replicates dynamical behaviors. We verify the effectiveness of the proposed framework in several experiments, including the stochastic Van der Pal oscillator, El Niño-Southern Oscillation simplified model, and stochastic Lorenz system. These experiments consist of Markov/non-Markov and stationary/non-stationary stochastic processes which are defined by linear/nonlinear stochastic differential equations or stochastic delay differential equations. Additionally, we explore the noise-induced tip** phenomenon, relaxation oscillation, stochastic mixed-mode oscillation, and replication of the strange attractor.
△ Less
Submitted 30 July, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Environment Invariant Linear Least Squares
Authors:
Jianqing Fan,
Cong Fang,
Yihong Gu,
Tong Zhang
Abstract:
This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the conditional expectations of $y$ given the unknown set of important variables are invariant. Such a statistical model is related to the problem of endogeneity,…
▽ More
This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the conditional expectations of $y$ given the unknown set of important variables are invariant. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel environment invariant linear least squares (EILLS) objective function, a multi-environment version of linear least-squares regression that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition. We establish non-asymptotic $\ell_2$ error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the $\ell_0$ penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These non-asymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any prior structural knowledge. To the best of our knowledge, this paper is the first to realize statistically efficient invariance learning in the general linear model.
△ Less
Submitted 25 November, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium
Authors:
Shihong Ding,
Hanze Dong,
Cong Fang,
Zhouchen Lin,
Tong Zhang
Abstract:
We consider the non-convex non-concave objective function in two-player zero-sum continuous games. The existence of pure Nash equilibrium requires stringent conditions, posing a major challenge for this problem. To circumvent this difficulty, we examine the problem of identifying a mixed Nash equilibrium, where strategies are randomized and characterized by probability distributions over continuou…
▽ More
We consider the non-convex non-concave objective function in two-player zero-sum continuous games. The existence of pure Nash equilibrium requires stringent conditions, posing a major challenge for this problem. To circumvent this difficulty, we examine the problem of identifying a mixed Nash equilibrium, where strategies are randomized and characterized by probability distributions over continuous domains.To this end, we propose PArticle-based Primal-dual ALgorithm (PAPAL) tailored for a weakly entropy-regularized min-max optimization over probability distributions. This algorithm employs the stochastic movements of particles to represent the updates of random strategies for the $ε$-mixed Nash equilibrium. We offer a comprehensive convergence analysis of the proposed algorithm, demonstrating its effectiveness. In contrast to prior research that attempted to update particle importance without movements, PAPAL is the first implementable particle-based algorithm accompanied by non-asymptotic quantitative convergence results, running time, and sample complexity guarantees. Our framework contributes novel insights into the particle-based algorithms for continuous min-max optimization in the general non-convex non-concave setting.
△ Less
Submitted 20 November, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
On local Turán density problems of hypergraphs
Authors:
Chunqiu Fang,
Guorong Gao,
Jie Ma,
Ge Song
Abstract:
For integers $q\ge p\ge r\ge2$, we say that an $r$-uniform hypergraph $H$ has property $(q,p)$, if for any $q$-vertex subset $Q$ of $V(H)$, there exists a $p$-vertex subset $P$ of $Q$ spanning a clique in $H$. Let $T_{r}(n,q,p)=\min\{ e(H): H\subset \binom{[n]}{r}, H \text{~has property~} (q,p)\}$. The local Turán density about property $(q,p)$ in $r$-uniform hypergraphs is defined as…
▽ More
For integers $q\ge p\ge r\ge2$, we say that an $r$-uniform hypergraph $H$ has property $(q,p)$, if for any $q$-vertex subset $Q$ of $V(H)$, there exists a $p$-vertex subset $P$ of $Q$ spanning a clique in $H$. Let $T_{r}(n,q,p)=\min\{ e(H): H\subset \binom{[n]}{r}, H \text{~has property~} (q,p)\}$. The local Turán density about property $(q,p)$ in $r$-uniform hypergraphs is defined as $t_{r}(q,p)=\lim_{n\to \infty}T_{r}(n,q,p)/\binom{n}{r}$. Frankl, Huang and Rödl [J. Comb. Theory, Ser. A, 177 (2021)] showed that $\lim_{p\to\infty}t_{r}(ap+1,p+1)=\frac{1}{a^{r-1}}$ for positive integer $a$ and $t_{3}(2p+1,p+1)=\frac{1}{4}$ for all $p\ge 3$ and asked the question that determining the value of $\lim_{p\to\infty}t_{r}(γp+1,p+1)$, where $γ\ge 1$ is a real number. Based on the study of hypergraph Turán densities, we determine some exact values of local Turán densities and answer their question partially; in particular, our results imply that the equality in their question about exact values does not hold in general.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
On the Lower Bound of Minimizing Polyak-Łojasiewicz Functions
Authors:
Pengyun Yue,
Cong Fang,
Zhouchen Lin
Abstract:
Polyak-Łojasiewicz (PL) [Polyak, 1963] condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least…
▽ More
Polyak-Łojasiewicz (PL) [Polyak, 1963] condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least $Ω\left(\frac{L}μ\log\frac{1}{\varepsilon}\right)$ gradient costs to find an $\varepsilon$-approximate optimal solution for a general $L$-smooth function that has an $μ$-PL constant. This result demonstrates the optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a ``hard'' PL function such that no first-order algorithm can be faster than Gradient Descent when ignoring a numerical constant. In contrast, it is well-known that the momentum technique, e.g. [Nesterov, 2003, chap. 2] can provably accelerate Gradient Descent to ${O}\left(\sqrt{\frac{L}{\hatμ}}\log\frac{1}{\varepsilon}\right)$ gradient costs for functions that are $L$-smooth and $\hatμ$-strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.
△ Less
Submitted 2 August, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Model-free False Data Injection Attack in Networked Control Systems: A Feedback Optimization Approach
Authors:
Xiaoyu Luo,
Chongrong Fang,
Jian** He,
Chengcheng Zhao,
Dario Paccagnan
Abstract:
Security issues have gathered growing interest within the control systems community, as physical components and communication networks are increasingly vulnerable to cyber attacks. In this context, recent literature has studied increasingly sophisticated \emph{false data injection} attacks, with the aim to design mitigative measures that improve the systems' security. Notably, data-driven attack s…
▽ More
Security issues have gathered growing interest within the control systems community, as physical components and communication networks are increasingly vulnerable to cyber attacks. In this context, recent literature has studied increasingly sophisticated \emph{false data injection} attacks, with the aim to design mitigative measures that improve the systems' security. Notably, data-driven attack strategies -- whereby the system dynamics is oblivious to the adversary -- have received increasing attention. However, many of the existing works on the topic rely on the implicit assumption of linear system dynamics, significantly limiting their scope. Contrary to that, in this work we design and analyze \emph{truly} model-free false data injection attack that applies to general linear and nonlinear systems. More specifically, we aim at designing an injected signal that steers the output of the system toward a (maliciously chosen) trajectory. We do so by designing a zeroth-order feedback optimization policy and jointly use probing signals for real-time measurements. We then characterize the quality of the proposed model-free attack through its optimality gap, which is affected by the dimensions of the attack signal, the number of iterations performed, and the convergence rate of the system. Finally, we extend the proposed attack scheme to the systems with internal noise. Extensive simulations show the effectiveness of the proposed attack scheme.
△ Less
Submitted 21 August, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
The boundedness of commutators of sublinear operators on Herz Triebel-Lizorkin spaces with variable exponent
Authors:
Chenglong Fang,
Yingying Wei,
**g Zhang
Abstract:
In this paper, the authors first discuss the characterization of Herz Triebel-Lizorkin spaces with variable exponent via two families of operators. By this characterization, the authors prove that the Lipschitz commutators of sublinear operators is bounded from Herz spaces with variable exponent to Herz Triebel-Lizorkin spaces with variable exponent. As an application, the corresponding boundednes…
▽ More
In this paper, the authors first discuss the characterization of Herz Triebel-Lizorkin spaces with variable exponent via two families of operators. By this characterization, the authors prove that the Lipschitz commutators of sublinear operators is bounded from Herz spaces with variable exponent to Herz Triebel-Lizorkin spaces with variable exponent. As an application, the corresponding boundedness estimates for the commutators of maximal operator, Riesz potential operator and Calderón-Zygmund operator are established.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Submodularity-based False Data Injection Attack Scheme in Multi-agent Dynamical Systems
Authors:
Xiaoyu Luo,
Chengcheng Zhao,
Chongrong Fang,
Jian** He
Abstract:
Consensus in multi-agent dynamical systems is prone to be sabotaged by the adversary, which has attracted much attention due to its key role in broad applications. In this paper, we study a new false data injection (FDI) attack design problem, where the adversary with limited capability aims to select a subset of agents and manipulate their local multi-dimensional states to maximize the consensus…
▽ More
Consensus in multi-agent dynamical systems is prone to be sabotaged by the adversary, which has attracted much attention due to its key role in broad applications. In this paper, we study a new false data injection (FDI) attack design problem, where the adversary with limited capability aims to select a subset of agents and manipulate their local multi-dimensional states to maximize the consensus convergence error. We first formulate the FDI attack design problem as a combinatorial optimization problem and prove it is NP-hard. Then, based on the submodularity optimization theory, we show the convergence error is a submodular function of the set of the compromised agents, which satisfies the property of diminishing marginal returns. In other words, the benefit of adding an extra agent to the compromised set decreases as that set becomes larger. With this property, we exploit the greedy scheme to find the optimal compromised agent set that can produce the maximum convergence error when adding one extra agent to that set each time. Thus, the FDI attack set selection algorithms are developed to obtain the near-optimal subset of the compromised agents. Furthermore, we derive the analytical suboptimality bounds and the worst-case running time under the proposed algorithms. Extensive simulation results are conducted to show the effectiveness of the proposed algorithm.
△ Less
Submitted 8 March, 2022; v1 submitted 16 January, 2022;
originally announced January 2022.
-
An inexact primal-dual method with correction step for a saddle point problem in image debluring
Authors:
Changjie Fang,
Liliang Hu,
Shenglan Chen
Abstract:
In this paper,we present an inexact primal-dual method with correction step for a saddle point problem by introducing the notations of inexact extended proximal operators with symmetric positive definite matrix
$D$. Relaxing requirement on primal-dual step sizes, we prove the convergence of the proposed method. We also establish the $O(1/N)$ convergence rate of our method in the ergodic sense. M…
▽ More
In this paper,we present an inexact primal-dual method with correction step for a saddle point problem by introducing the notations of inexact extended proximal operators with symmetric positive definite matrix
$D$. Relaxing requirement on primal-dual step sizes, we prove the convergence of the proposed method. We also establish the $O(1/N)$ convergence rate of our method in the ergodic sense. Moreover, we apply our method to solve TV-L$_1$ image deblurring problems. Numerical simulation results illustrate the efficiency of our method.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training
Authors:
Cong Fang,
Hangfeng He,
Qi Long,
Weijie J. Su
Abstract:
In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on th…
▽ More
In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on the two parts of the network. We demonstrate that the Layer-Peeled Model, albeit simple, inherits many characteristics of well-trained neural networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training. First, when working on class-balanced datasets, we prove that any solution to this model forms a simplex equiangular tight frame, which in part explains the recently discovered phenomenon of neural collapse \cite{papyan2020prevalence}. More importantly, when moving to the imbalanced case, our analysis of the Layer-Peeled Model reveals a hitherto unknown phenomenon that we term \textit{Minority Collapse}, which fundamentally limits the performance of deep learning models on the minority classes. In addition, we use the Layer-Peeled Model to gain insights into how to mitigate Minority Collapse. Interestingly, this phenomenon is first predicted by the Layer-Peeled Model before being confirmed by our computational experiments.
△ Less
Submitted 8 September, 2021; v1 submitted 29 January, 2021;
originally announced January 2021.
-
Mathematical Models of Overparameterized Neural Networks
Authors:
Cong Fang,
Hanze Dong,
Tong Zhang
Abstract:
Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important t…
▽ More
Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer neural networks, and explain the key mathematical models, with their algorithmic implications. We will then discuss challenges in understanding deep neural networks and some current research directions.
△ Less
Submitted 27 December, 2020;
originally announced December 2020.
-
Turán numbers and anti-Ramsey numbers for short cycles in complete $3$-partite graphs
Authors:
Chunqiu Fang,
Ervin Győri,
Chuanqi Xiao,
Jimeng Xiao
Abstract:
We call a $4$-cycle in $K_{n_{1}, n_{2}, n_{3}}$ multipartite, denoted by $C_{4}^{\text{multi}}$, if it contains at least one vertex in each part of $K_{n_{1}, n_{2}, n_{3}}$. The Turán number $\text{ex}(K_{n_{1},n_{2},n_{3}}, C_{4}^{\text{multi}})$ $\bigg($ respectively, $\text{ex}(K_{n_{1},n_{2},n_{3}},\{C_{3}, C_{4}^{\text{multi}}\})$ $\bigg)$ is the maximum number of edges in a graph…
▽ More
We call a $4$-cycle in $K_{n_{1}, n_{2}, n_{3}}$ multipartite, denoted by $C_{4}^{\text{multi}}$, if it contains at least one vertex in each part of $K_{n_{1}, n_{2}, n_{3}}$. The Turán number $\text{ex}(K_{n_{1},n_{2},n_{3}}, C_{4}^{\text{multi}})$ $\bigg($ respectively, $\text{ex}(K_{n_{1},n_{2},n_{3}},\{C_{3}, C_{4}^{\text{multi}}\})$ $\bigg)$ is the maximum number of edges in a graph $G\subseteq K_{n_{1},n_{2},n_{3}}$ such that $G$ contains no $C_{4}^{\text{multi}}$ $\bigg($ respectively, $G$ contains neither $C_{3}$ nor $C_{4}^{\text{multi}}$ $\bigg)$. We call a $C^{multi}_4$ rainbow if all four edges of it have different colors. The ant-Ramsey number $\text{ar}(K_{n_{1},n_{2},n_{3}}, C_{4}^{\text{multi}})$ is the maximum number of colors in an edge-colored of $K_{n_{1},n_{2},n_{3}}$ with no rainbow $C_{4}^{\text{multi}}$. In this paper, we determine that $\text{ex}(K_{n_{1},n_{2},n_{3}}, C_{4}^{\text{multi}})=n_{1}n_{2}+2n_{3}$ and $\text{ar}(K_{n_{1},n_{2},n_{3}}, C_{4}^{\text{multi}})=\text{ex}(K_{n_{1},n_{2},n_{3}}, \{C_{3}, C_{4}^{\text{multi}}\})+1=n_{1}n_{2}+n_{3}+1,$ where $n_{1}\ge n_{2}\ge n_{3}\ge 1.$
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
An inertial Tseng's extragradient method for solving multi-valued variational inequalities with one projection
Authors:
Changjie Fang,
Ruirui Zhang,
Shenglan Chen
Abstract:
In this paper, we introduce an inertial Tseng's extragradient method for solving multi-valued variational inequalits, in which only one projection is needed at each iterate. We also obtain the strong convergence results of the proposed algorithm, provided that the multi-valued map** is continuous and pseudomonotone with nonempty compact convex values. Moreover, numerical simulation results illus…
▽ More
In this paper, we introduce an inertial Tseng's extragradient method for solving multi-valued variational inequalits, in which only one projection is needed at each iterate. We also obtain the strong convergence results of the proposed algorithm, provided that the multi-valued map** is continuous and pseudomonotone with nonempty compact convex values. Moreover, numerical simulation results illustrate the efficiency of our method when compared to existing methods.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Improved Analysis of Clip** Algorithms for Non-convex Optimization
Authors:
Bohang Zhang,
Jikai **,
Cong Fang,
Liwei Wang
Abstract:
Gradient clip** is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typica…
▽ More
Gradient clip** is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks. However, their iteration complexities on the problem-dependent parameters are rather pessimistic, and theoretical justification of clip** combined with other crucial techniques, e.g. momentum acceleration, are still lacking. In this paper, we bridge the gap by presenting a general framework to study the clip** algorithms, which also takes momentum methods into consideration. We provide convergence analysis of the framework in both deterministic and stochastic setting, and demonstrate the tightness of our results by comparing them with existing lower bounds. Our results imply that the efficiency of clip** methods will not degenerate even in highly non-smooth regions of the landscape. Experiments confirm the superiority of clip**-based methods in deep learning tasks.
△ Less
Submitted 28 October, 2020; v1 submitted 5 October, 2020;
originally announced October 2020.
-
The anti-Ramsey number of $C_{3}$ and $C_{4}$ in the complete $r$-partite graphs
Authors:
Chunqiu Fang,
Ervin Győri,
Binlong Li,
Jimeng Xiao
Abstract:
A subgraph of an edge-colored graph is rainbow, if all of its edges have different colors. For a graph $G$ and a family $\mathcal{H}$ of graphs, the anti-Ramsey number $ar(G, \mathcal{H})$ is the maximum number $k$ such that there exists an edge-coloring of $G$ with exactly $k$ colors without rainbow copy of any graph in $\mathcal{H}$. In this paper, we study the anti-Ramsey number of $C_{3}$ and…
▽ More
A subgraph of an edge-colored graph is rainbow, if all of its edges have different colors. For a graph $G$ and a family $\mathcal{H}$ of graphs, the anti-Ramsey number $ar(G, \mathcal{H})$ is the maximum number $k$ such that there exists an edge-coloring of $G$ with exactly $k$ colors without rainbow copy of any graph in $\mathcal{H}$. In this paper, we study the anti-Ramsey number of $C_{3}$ and $C_{4}$ in the complete $r$-partite graphs. For $r\ge 3$ and $n_{1}\ge n_{2}\ge \cdots\ge n_{r}\ge 1$, we determine $ ar(K_{n_{1}, n_{2}, \ldots, n_{r}},\{C_{3}, C_{4}\}), ar(K_{n_{1}, n_{2}, \ldots, n_{r}}, C_{3})$ and $ar(K_{n_{1}, n_{2}, \ldots, n_{r}}, C_{4})$.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks
Authors:
Cong Fang,
Jason D. Lee,
Pengkun Yang,
Tong Zhang
Abstract:
This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented by probability measures and functions over its features (that is, the function values of the hidden units over the training data) in the continuous limit, instead of the neural network parameters as most exi…
▽ More
This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented by probability measures and functions over its features (that is, the function values of the hidden units over the training data) in the continuous limit, instead of the neural network parameters as most existing studies have done. This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, and further leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re-parameterization. Moreover, we construct a non-linear dynamics called neural feature flow, which captures the evolution of an over-parameterized DNN trained by Gradient Descent. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures. Furthermore, we show, for Res-Net, when the neural feature flow process converges, it reaches a global minimal solution under suitable conditions. Our analysis leads to the first global convergence proof for over-parameterized neural network training with more than $3$ layers in the mean-field regime.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Minimal colorings for properly colored subgraphs in complete graphs
Authors:
Chunqiu Fang,
Ervin Győri,
Jimeng Xiao
Abstract:
Let $pr(K_{n}, G)$ be the maximum number of colors in an edge-coloring of $K_{n}$ with no properly colored copy of $G$. In this paper, we show that $pr(K_{n}, G)-ex(n, \mathcal{G'})=o(n^{2}), $ where $\mathcal{G'}=\{G-M: M \text{ is a matching of }G\}$. Furthermore, we determine the value of $pr(K_{n}, P_{l})$ for $l\ge 27$ and $n\ge 2l^{3}$ and the exact value of $pr(K_{n}, G)$, where $G$ is…
▽ More
Let $pr(K_{n}, G)$ be the maximum number of colors in an edge-coloring of $K_{n}$ with no properly colored copy of $G$. In this paper, we show that $pr(K_{n}, G)-ex(n, \mathcal{G'})=o(n^{2}), $ where $\mathcal{G'}=\{G-M: M \text{ is a matching of }G\}$. Furthermore, we determine the value of $pr(K_{n}, P_{l})$ for $l\ge 27$ and $n\ge 2l^{3}$ and the exact value of $pr(K_{n}, G)$, where $G$ is $C_{5}, C_{6}$ and $K_{4}^{-}$, respectively. Also, we give an upper bound and a lower bound of $pr(K_{n}, K_{2,3})$.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations
Authors:
Cong Fang,
Hanze Dong,
Tong Zhang
Abstract:
Recently, over-parameterized neural networks have been extensively analyzed in the literature. However, the previous studies cannot satisfactorily explain why fully trained neural networks are successful in practice. In this paper, we present a new theoretical framework for analyzing over-parameterized neural networks which we call neural feature repopulation. Our analysis can satisfactorily expla…
▽ More
Recently, over-parameterized neural networks have been extensively analyzed in the literature. However, the previous studies cannot satisfactorily explain why fully trained neural networks are successful in practice. In this paper, we present a new theoretical framework for analyzing over-parameterized neural networks which we call neural feature repopulation. Our analysis can satisfactorily explain the empirical success of two level neural networks that are trained by standard learning algorithms. Our key theoretical result is that in the limit of infinite number of hidden neurons, over-parameterized two-level neural networks trained via the standard (noisy) gradient descent learns a well-defined feature distribution (population), and the limiting feature distribution is nearly optimal for the underlying learning task under certain conditions. Empirical studies confirm that predictions of our theory are consistent with the results observed in real practice.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Load Forecasting Model and Day-ahead Operation Strategy for City-located EV Quick Charge Stations
Authors:
Zeyu Liu,
Yaxin Xie,
Donghan Feng,
Yun Zhou,
Shanshan Shi,
Chen Fang
Abstract:
Charging demands of electric vehicles (EVs) are sharply increasing due to the rapid development of EVs. Hence, reliable and convenient quick charge stations are required to respond to the needs of EV drivers. Due to the uncertainty of EV charging loads, load forecasting becomes vital for the operation of quick charge stations to formulate the day-ahead plan. In this paper, based on trip chain theo…
▽ More
Charging demands of electric vehicles (EVs) are sharply increasing due to the rapid development of EVs. Hence, reliable and convenient quick charge stations are required to respond to the needs of EV drivers. Due to the uncertainty of EV charging loads, load forecasting becomes vital for the operation of quick charge stations to formulate the day-ahead plan. In this paper, based on trip chain theory and EV user behaviour, an EV charging load forecasting model is established for quick charge station operators. This model is capable of forecasting the charging demand of a city-located quick charge station during the next day, where the Monte-Carlo simulation method is applied. Furthermore, based on the forecasting model, a day-ahead profit-oriented operation strategy for such stations is derived. The simulation results support the effectiveness of this forecasting model and the operation strategy. The conclusions of this paper are as follows: 1) The charging load forecasting model ensures operators to grasp the feature of the charging load of the next day. 2) The revenue of the quick charge station can be dramatically increased by applying the proposed day-head operation strategy.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
On the anti-Ramsey number of forests
Authors:
Chunqiu Fang,
Ervin Győri,
Mei Lu,
Jimeng Xiao
Abstract:
We call a subgraph of an edge-colored graph rainbow subgraph, if all of its edges have different colors. The anti-Ramsey number of a graph $G$ in a complete graph $K_{n}$, denoted by $ar(K_{n}, G)$, is the maximum number of colors in an edge-coloring of $K_{n}$ with no rainbow subgraph copy of $G$. In this paper, we determine the exact value of the anti-Ramsey number for star forests and the appro…
▽ More
We call a subgraph of an edge-colored graph rainbow subgraph, if all of its edges have different colors. The anti-Ramsey number of a graph $G$ in a complete graph $K_{n}$, denoted by $ar(K_{n}, G)$, is the maximum number of colors in an edge-coloring of $K_{n}$ with no rainbow subgraph copy of $G$. In this paper, we determine the exact value of the anti-Ramsey number for star forests and the approximate value of the anti-Ramsey number for linear forests. Furthermore, we compute the exact value of $ar(K_{n}, 2P_{4})$ for $n\ge 8$ and $ar(K_{n}, S_{p,q})$ for large $n$, where $S_{p,q}$ is the double star with $p+q$ leaves.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
A Stochastic Trust Region Method for Non-convex Minimization
Authors:
Zebang Shen,
Pan Zhou,
Cong Fang,
Alejandro Ribeiro
Abstract:
We target the problem of finding a local minimum in non-convex finite-sum minimization. Towards this goal, we first prove that the trust region method with inexact gradient and Hessian estimation can achieve a convergence rate of order $\mathcal{O}(1/{k^{2/3}})$ as long as those differential estimations are sufficiently accurate. Combining such result with a novel Hessian estimator, we propose the…
▽ More
We target the problem of finding a local minimum in non-convex finite-sum minimization. Towards this goal, we first prove that the trust region method with inexact gradient and Hessian estimation can achieve a convergence rate of order $\mathcal{O}(1/{k^{2/3}})$ as long as those differential estimations are sufficiently accurate. Combining such result with a novel Hessian estimator, we propose the sample-efficient stochastic trust region (STR) algorithm which finds an $(ε, \sqrtε)$-approximate local minimum within $\mathcal{O}({\sqrt{n}}/{ε^{1.5}})$ stochastic Hessian oracle queries. This improves state-of-the-art result by $\mathcal{O}(n^{1/6})$. Experiments verify theoretical conclusions and the efficiency of STR.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
Sharp Analysis for Nonconvex SGD Esca** from Saddle Points
Authors:
Cong Fang,
Zhouchen Lin,
Tong Zhang
Abstract:
In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(ε, O(ε^{0.5}))$-approximate second-order stationary point in $\tilde{O}(ε^{-3.5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, an…
▽ More
In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(ε, O(ε^{0.5}))$-approximate second-order stationary point in $\tilde{O}(ε^{-3.5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions. This result subverts the classical belief that SGD requires at least $O(ε^{-4})$ stochastic gradient computations for obtaining an $(ε,O(ε^{0.5}))$-approximate second-order stationary point. Such SGD rate matches, up to a polylogarithmic factor of problem-dependent parameters, the rate of most accelerated nonconvex stochastic optimization algorithms that adopt additional techniques, such as Nesterov's momentum acceleration, negative curvature search, as well as quadratic and cubic regularization tricks. Our novel analysis gives new insights into nonconvex SGD and can be potentially generalized to a broad class of stochastic optimization algorithms.
△ Less
Submitted 4 June, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Lifted Proximal Operator Machines
Authors:
Jia Li,
Cong Fang,
Zhouchen Lin
Abstract:
We propose a new optimization method for training feed-forward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multi-convex in all layer-wise weights and activations.…
▽ More
We propose a new optimization method for training feed-forward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multi-convex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations in parallel. Most notably, we only use the map** of the activation function itself, rather than its derivatives, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. We further prove the convergence of updating the layer-wise weights and activations. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters
Authors:
Huan Li,
Cong Fang,
Wotao Yin,
Zhouchen Lin
Abstract:
In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization and it obtains the n…
▽ More
In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters. Our first algorithm is for smooth distributed optimization and it obtains the near optimal $O\left(\sqrt{\frac{L}{ε(1-σ_2(W))}}\log\frac{1}ε\right)$ communication complexity and the optimal $O\left(\sqrt{\frac{L}ε}\right)$ gradient computation complexity for $L$-smooth convex problems, where $σ_2(W)$ denotes the second largest singular value of the weight matrix $W$ associated to the network and $ε$ is the target accuracy. When the problem is $μ$-strongly convex and $L$-smooth, our algorithm has the near optimal $O\left(\sqrt{\frac{L}{μ(1-σ_2(W))}}\log^2\frac{1}ε\right)$ complexity for communications and the optimal $O\left(\sqrt{\frac{L}μ}\log\frac{1}ε\right)$ complexity for gradient computations. Our communication complexities are only worse by a factor of $\left(\log\frac{1}ε\right)$ than the lower bounds for the smooth distributed optimization. %As far as we know, our method is the first to achieve both communication and gradient computation lower bounds up to an extra logarithm factor for smooth distributed optimization. Our second algorithm is designed for non-smooth distributed optimization and it achieves both the optimal $O\left(\frac{1}{ε\sqrt{1-σ_2(W)}}\right)$ communication complexity and $O\left(\frac{1}{ε^2}\right)$ subgradient computation complexity, which match the communication and subgradient computation complexity lower bounds for non-smooth distributed optimization.
△ Less
Submitted 18 August, 2020; v1 submitted 1 October, 2018;
originally announced October 2018.
-
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
Authors:
Cong Fang,
Chris Junchi Li,
Zhouchen Lin,
Tong Zhang
Abstract:
In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient…
▽ More
In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost. We apply SPIDER to two tasks, namely the stochastic first-order and zeroth-order methods. For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO\textsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only. We provide sharp error-bound results on their convergence rates. In special, we prove that the SPIDER-SFO and SPIDER-SFO\textsuperscript{+} algorithms achieve a record-breaking gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} ε^{-2}, ε^{-3} ) \right)$ for finding an $ε$-approximate first-order and $\tilde{\mathcal{O}}\left( \min( n^{1/2} ε^{-2}+ε^{-2.5}, ε^{-3} ) \right)$ for finding an $(ε, \mathcal{O}(ε^{0.5}))$-approximate second-order stationary point, respectively. In addition, we prove that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting. For stochastic zeroth-order method, we prove a cost of $\mathcal{O}( d \min( n^{1/2} ε^{-2}, ε^{-3}) )$ which outperforms all existing results.
△ Less
Submitted 17 October, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation
Authors:
Cong Fang,
Yameng Huang,
Zhouchen Lin
Abstract:
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the "momentum compensation" technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous Gra…
▽ More
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the "momentum compensation" technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous Gradient Descent, which achieves a faster $O(1/\sqrtε)$ (v.s. $O(1/ε)$) convergence rate for non-strongly convex functions, and $O(\sqrtκ\log(1/ε))$ (v.s. $O(κ\log(1/ε))$) for strongly convex functions to reach an $ε$- approximate minimizer with the condition number $κ$. We further apply the technique to accelerate modern stochastic asynchronous algorithms such as Asynchronous Stochastic Coordinate Descent and Asynchronous Stochastic Gradient Descent. Both of the resultant practical algorithms are faster than existing ones by order. To the best of our knowledge, we are the first to consider accelerated algorithms that allow updating by delayed gradients and are the first to propose truly accelerated asynchronous algorithms. Finally, the experimental results on a shared memory system show that acceleration can lead to significant performance gains on ill-conditioned problems.
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
Convergence Rates Analysis of The Quadratic Penalty Method and Its Applications to Decentralized Distributed Optimization
Authors:
Huan Li,
Cong Fang,
Zhouchen Lin
Abstract:
In this paper, we study a variant of the quadratic penalty method for linearly constrained convex problems, which has already been widely used but actually lacks theoretical justification. Namely, the penalty parameter steadily increases and the penalized objective function is minimized inexactly rather than exactly, e.g., with only one step of the proximal gradient descent. For such a variant of…
▽ More
In this paper, we study a variant of the quadratic penalty method for linearly constrained convex problems, which has already been widely used but actually lacks theoretical justification. Namely, the penalty parameter steadily increases and the penalized objective function is minimized inexactly rather than exactly, e.g., with only one step of the proximal gradient descent. For such a variant of the quadratic penalty method, we give counterexamples to show that it may not give a solution to the original constrained problem. By choosing special penalty parameters, we ensure the convergence and further establish the convergence rates of $O\left(\frac{1}{\sqrt{K}}\right)$ for the generally convex problems and $O\left(\frac{1}{K}\right)$ for strongly convex ones, where $K$ is the number of iterations. Furthermore, by adopting Nesterov's extrapolation we show that the convergence rates can be improved to $O\left(\frac{1}{K}\right)$ for the generally convex problems and $O\left(\frac{1}{K^2}\right)$ for strongly convex ones.
When applied to the decentralized distributed optimization, the penalty methods studied in this paper become the widely used distributed gradient method and the fast distributed gradient method. However, due to the totally different analysis framework, we can improve their $O\left(\frac{\log K}{\sqrt{K}}\right)$ and $O\left(\frac{\log K}{K}\right)$ convergence rates to $O\left(\frac{1}{\sqrt{K}}\right)$ and $O\left(\frac{1}{K}\right)$ with fewer assumptions on the network topology for general convex problems. Using our analysis framework, we also extend the fast distributed gradient method to a communication efficient version, i.e., finding an $\varepsilon$ solution in $O\left(\frac{1}{\varepsilon}\right)$ communications and $O\left(\frac{1}{\varepsilon^{2+δ}}\right)$ computations for the non-smooth problems, where $δ$ is a small constant.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers
Authors:
Cong Fang,
Feng Cheng,
Zhouchen Lin
Abstract:
We study stochastic convex optimization subjected to linear equality constraints. Traditional Stochastic Alternating Direction Method of Multipliers and its Nesterov's acceleration scheme can only achieve ergodic O(1/\sqrt{K}) convergence rates, where K is the number of iteration. By introducing Variance Reduction (VR) techniques, the convergence rates improve to ergodic O(1/K). In this paper, we…
▽ More
We study stochastic convex optimization subjected to linear equality constraints. Traditional Stochastic Alternating Direction Method of Multipliers and its Nesterov's acceleration scheme can only achieve ergodic O(1/\sqrt{K}) convergence rates, where K is the number of iteration. By introducing Variance Reduction (VR) techniques, the convergence rates improve to ergodic O(1/K). In this paper, we propose a new stochastic ADMM which elaborately integrates Nesterov's extrapolation and VR techniques. We prove that our algorithm can achieve a non-ergodic O(1/K) convergence rate which is optimal for separable linearly constrained non-smooth convex problems, while the convergence rates of VR based ADMM methods are actually tight O(1/\sqrt{K}) in non-ergodic sense. To the best of our knowledge, this is the first work that achieves a truly accelerated, stochastic convergence rate for constrained convex problems. The experimental results demonstrate that our algorithm is significantly faster than the existing state-of-the-art stochastic ADMM methods.
△ Less
Submitted 22 April, 2017;
originally announced April 2017.
-
borealis - A generalized global update algorithm for Boolean optimization problems
Authors:
Zheng Zhu,
Chao Fang,
Helmut G. Katzgraber
Abstract:
Optimization problems with Boolean variables that fall into the nondeterministic polynomial (NP) class are of fundamental importance in computer science, mathematics, physics and industrial applications. Most notably, solving constraint-satisfaction problems, which are related to spin-glass-like Hamiltonians in physics, remains a difficult numerical task. As such, there has been great interest in…
▽ More
Optimization problems with Boolean variables that fall into the nondeterministic polynomial (NP) class are of fundamental importance in computer science, mathematics, physics and industrial applications. Most notably, solving constraint-satisfaction problems, which are related to spin-glass-like Hamiltonians in physics, remains a difficult numerical task. As such, there has been great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by parallel tempering Monte Carlo in conjunction with the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized global update optimization heuristic that can be applied to different NP-complete problems with Boolean variables. The global cluster updates allow for a wide-spread sampling of phase space, thus considerably speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle optimization problems with over-constraints or on topologies with a large site-percolation threshold. We illustrate the efficiency of the heuristic on paradigmatic optimization problems, such as the maximum satisfiability problem and the vertex cover problem.
△ Less
Submitted 30 May, 2016;
originally announced May 2016.
-
Narrow Orthogonally Additive Operators on Lattice-Normed Spaces
Authors:
Xiao Chun Fang,
Marat Pliev
Abstract:
The aim of this article is to extend results of M.~Popov and second named author about orthogonally additive narrow operators on vector lattices. The main object of our investigations are an orthogonally additive narrow operators between lattice-normed spaces. We prove that every $C$-compact laterally-to-norm continuous orthogonally additive operator from a Banach-Kantorovich space $V$ to a Banach…
▽ More
The aim of this article is to extend results of M.~Popov and second named author about orthogonally additive narrow operators on vector lattices. The main object of our investigations are an orthogonally additive narrow operators between lattice-normed spaces. We prove that every $C$-compact laterally-to-norm continuous orthogonally additive operator from a Banach-Kantorovich space $V$ to a Banach lattice $Y$ is narrow. We also show that every dominated Uryson operator from Banach-Kantorovich space over an atomless Dedekind complete vector lattice $E$ to a sequence Banach lattice $\ell_p(Γ)$ or $c_0(Γ)$ is narrow. Finally, we prove that if an orthogonally additive dominated operator $T$ from lattice-normed space $(V,E)$ to Banach-Kantorovich space $(W,F)$ is order narrow then the order narrow is its exact dominant $\ls T\rs$.
△ Less
Submitted 30 September, 2015;
originally announced September 2015.
-
Unified Subharmonic Oscillation Conditions for Peak or Average Current Mode Control
Authors:
Chung-Chieh Fang
Abstract:
This paper is an extension of the author's recent research in which only buck converters were analyzed. Similar analysis can be equally applied to other types of converters. In this paper, a unified model is proposed for buck, boost, and buck-boost converters under peak or average current mode control to predict the occurrence of subharmonic oscillation. Based on the unified model, the associated…
▽ More
This paper is an extension of the author's recent research in which only buck converters were analyzed. Similar analysis can be equally applied to other types of converters. In this paper, a unified model is proposed for buck, boost, and buck-boost converters under peak or average current mode control to predict the occurrence of subharmonic oscillation. Based on the unified model, the associated stability conditions are derived in closed forms. The same stability condition can be applied to buck, boost, and buck-boost converters. Based on the closed-form conditions, the effects of various converter parameters including the compensator poles and zeros on the stability can be clearly seen, and these parameters can be consolidated into a few ones. High-order compensators such as type-II and PI compensators are considered. Some new plots are also proposed for design purpose to avoid the instability. The instability is found to be associated with large crossover frequency. A conservative stability condition, agreed with the past research, is derived. The effect of the voltage loop ripple on the instability is also analyzed.
△ Less
Submitted 26 January, 2014; v1 submitted 28 October, 2013;
originally announced October 2013.
-
Discrete-Time Poles and Dynamics of Discontinuous Mode Boost and Buck Converters Under Various Control Schemes
Authors:
Chung-Chieh Fang
Abstract:
Nonlinear systems, such as switching DC-DC boost or buck converters, have rich dynamics. A simple one-dimensional discrete-time model is used to analyze the boost or buck converter in discontinuous conduction mode. Seven different control schemes (open-loop power stage, voltage mode control, current mode control, constant power load, constant current load, constant-on-time control, and boundary co…
▽ More
Nonlinear systems, such as switching DC-DC boost or buck converters, have rich dynamics. A simple one-dimensional discrete-time model is used to analyze the boost or buck converter in discontinuous conduction mode. Seven different control schemes (open-loop power stage, voltage mode control, current mode control, constant power load, constant current load, constant-on-time control, and boundary conduction mode) are analyzed systematically. The linearized dynamics is obtained simply by taking partial derivatives with respect to dynamic variables. In the discrete-time model, there is only a single pole and no zero. The single closed-loop pole is a linear combination of three terms: the open-loop pole, a term due to the control scheme, and a term due to the non-resistive load. Even with a single pole, the phase response of the discrete-time model can go beyond -90 degrees as in the two-pole average models. In the boost converter with a resistive load under current mode control, adding the compensating ramp has no effect on the pole location. Increasing the ramp slope decreases the DC gain of control-to-output transfer function and increases the audio-susceptibility. Similar analysis is applied to the buck converter with a non-resistive load or variable switching frequency. The derived dynamics agrees closely with the exact switching model and the past research results.
△ Less
Submitted 19 November, 2012;
originally announced November 2012.
-
Analysis and Control of Period-Doubling Bifurcation in Buck Converters Using Harmonic Balance
Authors:
Chung-Chieh Fang,
Eyad H. Abed
Abstract:
Period doubling bifurcation in buck converters is studied by using the harmonic balance method. A simple dynamic model of a buck converter in continuous conduction mode under voltage mode or current mode control is derived. This model consists of the feedback connection of a linear system and a nonlinear one. An exact harmonic balance analysis is used to obtain a necessary and sufficient condition…
▽ More
Period doubling bifurcation in buck converters is studied by using the harmonic balance method. A simple dynamic model of a buck converter in continuous conduction mode under voltage mode or current mode control is derived. This model consists of the feedback connection of a linear system and a nonlinear one. An exact harmonic balance analysis is used to obtain a necessary and sufficient condition for a period doubling bifurcation to occur. If such a bifurcation occurs, the analysis also provides information on its exact location. Using the condition for bifurcation, a feedforward control is designed to eliminate the period doubling bifurcation. This results in a wider range of allowed source voltage, and also in improved line regulation.
△ Less
Submitted 27 October, 2012;
originally announced October 2012.
-
Local Bifurcations in DC-DC Converters
Authors:
Chung-Chieh Fang,
Eyad H. Abed
Abstract:
Three local bifurcations in DC-DC converters are reviewed. They are period-doubling bifurcation, saddle-node bifurcation, and Neimark bifurcation. A general sampled-data model is employed to study the types of loss of stability of the nominal (periodic) solution and their connection with local bifurcations. More accurate prediction of instability and bifurcation than using the averaging approach i…
▽ More
Three local bifurcations in DC-DC converters are reviewed. They are period-doubling bifurcation, saddle-node bifurcation, and Neimark bifurcation. A general sampled-data model is employed to study the types of loss of stability of the nominal (periodic) solution and their connection with local bifurcations. More accurate prediction of instability and bifurcation than using the averaging approach is obtained. Examples of bifurcations associated with instabilities in DC-DC converters are given.
△ Less
Submitted 10 October, 2012;
originally announced October 2012.
-
Modeling and Instability of Average Current Control
Authors:
Chung-Chieh Fang
Abstract:
Dynamics and stability of average current control of DC-DC converters are analyzed by sampled-data modeling. Orbital stability is studied and it is found unrelated to the ripple size of the orbit. Compared with the averaged modeling, the sampled-data modeling is more accurate and systematic. An unstable range of compensator pole is found by simulations, and is predicted by sampled-data modeling an…
▽ More
Dynamics and stability of average current control of DC-DC converters are analyzed by sampled-data modeling. Orbital stability is studied and it is found unrelated to the ripple size of the orbit. Compared with the averaged modeling, the sampled-data modeling is more accurate and systematic. An unstable range of compensator pole is found by simulations, and is predicted by sampled-data modeling and harmonic balance modeling.
△ Less
Submitted 6 October, 2012;
originally announced October 2012.
-
Saddle-Node Bifurcation Associated with Parasitic Inductor Resistance in Boost Converters
Authors:
Chung-Chieh Fang
Abstract:
Saddle-node bifurcation occurs in a boost converter when parasitic inductor resistance is modeled. Closed-form critical conditions of the bifurcation are derived. If the parasitic inductor resistance is modeled, the saddle-node bifurcation occurs in the voltage mode control or in the current mode control with the voltage loop closed, but not in the current mode control with the voltage loop open.…
▽ More
Saddle-node bifurcation occurs in a boost converter when parasitic inductor resistance is modeled. Closed-form critical conditions of the bifurcation are derived. If the parasitic inductor resistance is modeled, the saddle-node bifurcation occurs in the voltage mode control or in the current mode control with the voltage loop closed, but not in the current mode control with the voltage loop open. If the parasitic inductor resistance is not modeled, the saddle-node bifurcation does not occur, and one may be misled by the wrong dynamics and the wrong steady-state solutions. The saddle-node bifurcation still exists even in a boost converter with a popular type-III compensator. When the saddle-node bifurcation occurs, multiple steady-state solutions may coexist. The converter may operate with a voltage jump from one solution to another. Care should be taken in the compensator design to ensure that only the desired solution is stabilized. In industry practice, the solution with a higher duty cycle (and thus the saddle-node bifurcation) may be prevented by placing a limitation on the maximum duty cycle.
△ Less
Submitted 6 October, 2012;
originally announced October 2012.
-
Characterizations of dominated splitting system and its relation to hyperbolicity
Authors:
Chun Fang,
Mats Gyllenberg,
Shitao Liu
Abstract:
Hyperbolicity and dominated splitting are two of the most important concepts in the global analysis of differentiable dynamics. In this paper we give several equivalent characterizations of the dominated splitting and in particular we show a criterion for dynamical systems being dominated splitting in terms of hyperbolicity.18 pages
Hyperbolicity and dominated splitting are two of the most important concepts in the global analysis of differentiable dynamics. In this paper we give several equivalent characterizations of the dominated splitting and in particular we show a criterion for dynamical systems being dominated splitting in terms of hyperbolicity.18 pages
△ Less
Submitted 25 September, 2012;
originally announced September 2012.
-
Floquet bundles for tridiagonal competitive-cooperative systems with Applications
Authors:
Chun Fang,
Mats Gyllenberg,
Yi Wang
Abstract:
For a general time-dependent linear competitive-cooperative tridiagonal system of differential equations, we obtain canonical Floquet invariant bundles which are exponentially separated in the framework of skew-product flows. Such Floquet bundles naturally reduce to the standard Floquet space when the system is assumed to be time-periodic. The obtained Floquet theory is applied to study the dynami…
▽ More
For a general time-dependent linear competitive-cooperative tridiagonal system of differential equations, we obtain canonical Floquet invariant bundles which are exponentially separated in the framework of skew-product flows. Such Floquet bundles naturally reduce to the standard Floquet space when the system is assumed to be time-periodic. The obtained Floquet theory is applied to study the dynamics on the hyperbolic omega-limit sets for the nonlinear competitive-cooperative tridiagonal systems in time-recurrent structures including almost periodicity and almost automorphy.
△ Less
Submitted 14 May, 2012;
originally announced May 2012.
-
Comments on "Bifurcations in DC-DC Switching Converters: Review of Methods and Applications"
Authors:
Chung-Chieh Fang
Abstract:
In a review paper [1] (El Aroudi, et al., 2005), two stability conditions for DC-DC converters are presented. However, these two conditions were published years earlier at least in a journal paper [2] (Fang and Abed, 2001). In this note, the similar texts of [1] and [2] are compared.
In a review paper [1] (El Aroudi, et al., 2005), two stability conditions for DC-DC converters are presented. However, these two conditions were published years earlier at least in a journal paper [2] (Fang and Abed, 2001). In this note, the similar texts of [1] and [2] are compared.
△ Less
Submitted 18 April, 2012;
originally announced April 2012.
-
Closed-Form Critical Conditions of Saddle-Node Bifurcations for Buck Converters
Authors:
Chung-Chieh Fang
Abstract:
A general and exact critical condition of saddle-node bifurcation is derived in closed form for the buck converter. The critical condition is helpful for the converter designers to predict or prevent some jump instabilities or coexistence of multiple solutions associated with the saddle-node bifurcation. Some previously known critical conditions become special cases in this generalized framework.…
▽ More
A general and exact critical condition of saddle-node bifurcation is derived in closed form for the buck converter. The critical condition is helpful for the converter designers to predict or prevent some jump instabilities or coexistence of multiple solutions associated with the saddle-node bifurcation. Some previously known critical conditions become special cases in this generalized framework. Given an arbitrary control scheme, a systematic procedure is proposed to derive the critical condition for that control scheme.
△ Less
Submitted 18 April, 2012;
originally announced April 2012.
-
Using Nyquist or Nyquist-Like Plot to Predict Three Typical Instabilities in DC-DC Converters
Authors:
Chung-Chieh Fang
Abstract:
By transforming an exact stability condition, a new Nyquist-like plot is proposed to predict occurrences of three typical instabilities in DC-DC converters. The three instabilities are saddle-node bifurcation (coexistence of multiple solutions), period-doubling bifurcation (subharmonic oscillation), and Neimark bifurcation (quasi-periodic oscillation). In a single plot, it accurately predicts whet…
▽ More
By transforming an exact stability condition, a new Nyquist-like plot is proposed to predict occurrences of three typical instabilities in DC-DC converters. The three instabilities are saddle-node bifurcation (coexistence of multiple solutions), period-doubling bifurcation (subharmonic oscillation), and Neimark bifurcation (quasi-periodic oscillation). In a single plot, it accurately predicts whether an instability occurs and what type the instability is. The plot is equivalent to the Nyquist plot, and it is a useful design tool to avoid these instabilities. Nine examples are used to illustrate the accuracy of this new plot to predict instabilities in the buck or boost converter with fixed or variable switching frequency.
△ Less
Submitted 9 April, 2012;
originally announced April 2012.
-
Comments on "Prediction of Subharmonic Oscillation in Switching Converters Under Different Control Strategies"
Authors:
Chung-Chieh Fang
Abstract:
A recent paper [1] (El Aroudi, 2012) misapplied a critical condition (Fang and Abed, 2001) to a well-known example. Even if the mistake is corrected, the results in [1] are applicable only to buck converters and period-doubling bifurcation. Actually, these results are known in Fang's works a decade ago which have broader critical conditions applicable to other converters and bifurcations. The flaw…
▽ More
A recent paper [1] (El Aroudi, 2012) misapplied a critical condition (Fang and Abed, 2001) to a well-known example. Even if the mistake is corrected, the results in [1] are applicable only to buck converters and period-doubling bifurcation. Actually, these results are known in Fang's works a decade ago which have broader critical conditions applicable to other converters and bifurcations. The flaws in [1] are identified.
△ Less
Submitted 2 April, 2012;
originally announced April 2012.
-
Closed-Form Critical Conditions of Subharmonic Oscillations for Buck Converters
Authors:
Chung-Chieh Fang
Abstract:
A general critical condition of subharmonic oscillation in terms of the loop gain is derived. Many closed-form critical conditions for various control schemes in terms of converter parameters are also derived. Some previously known critical conditions become special cases in the generalized framework. Given an arbitrary control scheme, a systematic procedure is proposed to derive the critical cond…
▽ More
A general critical condition of subharmonic oscillation in terms of the loop gain is derived. Many closed-form critical conditions for various control schemes in terms of converter parameters are also derived. Some previously known critical conditions become special cases in the generalized framework. Given an arbitrary control scheme, a systematic procedure is proposed to derive the critical condition for that control scheme. Different control schemes share similar forms of critical conditions. For example, both V2 control and voltage mode control have the same form of critical condition. A peculiar phenomenon in average current mode control where subharmonic oscillation occurs in a window value of pole can be explained by the derived critical condition. A ripple amplitude index to predict subharmonic oscillation proposed in the past research has limited application and is shown invalid for a converter with a large pole.
△ Less
Submitted 26 March, 2012;
originally announced March 2012.
-
Sampled-Data and Harmonic Balance Analyses of Average Current-Mode Controlled Buck Converter
Authors:
Chung-Chieh Fang
Abstract:
Dynamics and stability of average current-mode control of buck converters are analyzed by sampled-data and harmonic balance analyses. An exact sampled-data model is derived. A new continuous-time model "lifted" from the sampled-data model is also derived, and has frequency response matched with experimental data reported previously. Orbital stability is studied and it is found unrelated to the rip…
▽ More
Dynamics and stability of average current-mode control of buck converters are analyzed by sampled-data and harmonic balance analyses. An exact sampled-data model is derived. A new continuous-time model "lifted" from the sampled-data model is also derived, and has frequency response matched with experimental data reported previously. Orbital stability is studied and it is found unrelated to the ripple size of the current-loop compensator output. An unstable window of the current-loop compensator pole is found by simulations, and it can be accurately predicted by sampled-data and harmonic balance analyses. A new S plot accurately predicting the subharmonic oscillation is proposed. The S plot assists pole assignment and shows the required ramp slope to avoid instability.
△ Less
Submitted 21 February, 2012;
originally announced February 2012.