-
EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Authors:
Chung-Yiu Yau,
Hoi-To Wai,
Parameswaran Raman,
Soumajyoti Sarkar,
Mingyi Hong
Abstract:
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the…
▽ More
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Fractional derivatives of local times for some Gaussian processes
Authors:
Minhao Hong,
Qian Yu
Abstract:
In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variabl…
▽ More
In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variables and are also continuous with respect to the order of derivatives. Moreover, under some additional assumptions, we show that this condition is also necessary for existence of derivatives of the local time with the help of contour integration.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Problem-Parameter-Free Decentralized Nonconvex Stochastic Optimization
Authors:
Jiaxiang Li,
Xuxing Chen,
Shiqian Ma,
Mingyi Hong
Abstract:
Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm…
▽ More
Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm for decentralized nonconvex stochastic optimization that requires no prior knowledge of any problem parameters. We show that D-NASA has the optimal rate of convergence for nonconvex objectives under very mild conditions and enjoys the linear-speedup effect, i.e. the computation becomes faster as the number of nodes in the system increases. Extensive numerical experiments are conducted to support our findings.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
A Survey of Recent Advances in Optimization Methods for Wireless Communications
Authors:
Ya-Feng Liu,
Tsung-Hui Chang,
Mingyi Hong,
Zheyu Wu,
Anthony Man-Cho So,
Eduard A. Jorswieck,
Wei Yu
Abstract:
Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the n…
▽ More
Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recently developed optimization techniques in areas ranging from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing or develo** suitable algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions.
△ Less
Submitted 7 June, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Authors:
Mao Hong,
Zhiyue Zhang,
Yue Wu,
Yanxun Xu
Abstract:
Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully l…
▽ More
Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Authors:
Kaan Ozkara,
Can Karakus,
Parameswaran Raman,
Mingyi Hong,
Shoham Sabach,
Branislav Kveton,
Volkan Cevher
Abstract:
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra…
▽ More
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.
△ Less
Submitted 17 June, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Authors:
Ruichen Jiang,
Parameswaran Raman,
Shoham Sabach,
Aryan Mokhtari,
Mingyi Hong,
Volkan Cevher
Abstract:
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.…
▽ More
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning
Authors:
Yihua Zhang,
Prashant Khanduri,
Ioannis Tsaknakis,
Yuguang Yao,
Mingyi Hong,
Sijia Liu
Abstract:
Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become…
▽ More
Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications.
△ Less
Submitted 20 December, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
A Policy Gradient Method for Confounded POMDPs
Authors:
Mao Hong,
Zhengling Qi,
Yanxun Xu
Abstract:
In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of c…
▽ More
In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.
△ Less
Submitted 30 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Limit theorems for additive functionals of some self-similar Gaussian processes
Authors:
Minhao Hong,
Heguang Liu,
Fangjun Xu
Abstract:
Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gau…
▽ More
Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gaussian processes under consideration are required to satisfy certain strong local nondeterminism property. A tractable sufficient condition for the strong local nondeterminism property is given and it only relays on the covariance functions of the Gaussian processes. Moreover, we give a sufficient condition for the distribution function of a random vector to be determined by its moments.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
The Game of Life on the Robinson Triangle Penrose Tiling: Still Life
Authors:
Seung Hyeon Mandy Hong,
May Mei
Abstract:
We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes.
We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes.
△ Less
Submitted 10 April, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Limit laws for functionals of self-intersection symmetric alpha-stable processes
Authors:
Minhao Hong,
Qian Yu
Abstract:
In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed.
In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Global Classical Solutions Near Vacuum to the Initial-Boundary Value Problem of Isentropic Supersonic Flows through Divergent Ducts
Authors:
Ying-Chieh Lin,
Jay Chu,
John M. Hong,
Hsin-Yi Lee
Abstract:
In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source.…
▽ More
In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source. We provide a new result for the global existence of classical solutions to initial-boundary value problems of non-dissipative hyperbolic balance laws without the assumption of small data. The work is based on the local existence, the maximum principle and the uniform a priori estimates obtained by the generalized Lax transformations. The asymptotic behavior of classical solutions is also shown by studying the behavior of Riemann invariants along each characteristic curve and vertical line. The results can be applied to the spherically symmetric solutions to N-dimensional compressible Euler equations. Numerical simulations are provided to support our theoretical results.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Existence and convergence of the Beris-Edwards system with general Landau-de Gennes energy
Authors:
Zhewen Feng,
Min-Chun Hong,
Yu Mei
Abstract:
In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-t…
▽ More
In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-tensors converge smoothly to the solution of the Beris-Edwards system for uniaxial $Q$-tensors up to its maximal existence time.
△ Less
Submitted 10 November, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Existence of minimizers and convergence of critical points for a new Landau-de Gennes energy functional in nematic liquid crystals
Authors:
Zhewen Feng,
Min-Chun Hong
Abstract:
The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers.…
▽ More
The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. At first, we introduce a new Landau-de Gennes energy density with $L_4\neq 0$, which is equivalent to the original Landau-de Gennes density for uniaxial tensors and satisfies the coercivity condition for all $Q$-tensors. Secondly, we prove that solutions of the Landau-de Gennes system can approach a solution of the $Q$-tensor Oseen-Frank system without using energy minimizers. Thirdly, we develop a new approach to generalize the Nguyen and Zarnescu (Calc. Var. PDEs, 2013) convergence result to the case of non-zero elastic constants $L_2$, $L_3$, $L_4$.
△ Less
Submitted 28 September, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Minimax Problems with Coupled Linear Constraints: Computational Complexity, Duality and Solution Methods
Authors:
Ioannis Tsaknakis,
Mingyi Hong,
Shuzhong Zhang
Abstract:
In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the consider…
▽ More
In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the considered problem is challenging, in the sense that it violates the classical max-min inequality, and that it is NP-hard even under very strong assumptions (e.g., when the objective is strongly convex-strongly concave). We then develop a duality theory for it, and analyze conditions under which the duality gap becomes zero. Finally, we study a class of stationary solutions defined based on the dual problem, and evaluate their practical performance in an application on adversarial attacks on network flow problems.
△ Less
Submitted 25 November, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Biconservative hypersurfaces with constant scalar curvature in space forms
Authors:
Yu Fu,
Min-Chun Hong,
Dan Yang,
Xin Zhan
Abstract:
Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature…
▽ More
Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature in $N^4(c)$ has constant mean curvature. Moreover, we prove that any biconservative hypersurface with constant scalar curvature in $N^5(c)$ is ether an open part of a certain rotational hypersurface or a constant mean curvature hypersurface. These solve an open problem proposed recently by D. Fetcu and C. Oniciuc for $n\leq4$.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Primal-Dual First-Order Methods for Affinely Constrained Multi-Block Saddle Point Problems
Authors:
Junyu Zhang,
Mengdi Wang,
Mingyi Hong,
Shuzhong Zhang
Abstract:
We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under th…
▽ More
We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under the topics of ADMM, this minimax problem is rarely investigated. In this paper, a convenient notion of $ε$-saddle point is proposed, under which the convergence rate of several proposed algorithms are analyzed. When only one of $\mathbf{x}$ and $\mathbf{y}$ has multiple blocks and affine constraint, several natural extensions of ADMM are proposed to solve the problem. Depending on the number of blocks and the level of smoothness, $\mathcal{O}(1/T)$ or $\mathcal{O}(1/\sqrt{T})$ convergence rates are derived for our algorithms. When both $\mathbf{x}$ and $\mathbf{y}$ have multiple blocks and affine constraints, a new algorithm called ExtraGradient Method of Multipliers (EGMM) is proposed. Under desirable smoothness condition, an $\mathcal{O}(1/T)$ rate of convergence can be guaranteed regardless of the number of blocks in $\mathbf{x}$ and $\mathbf{y}$. In depth comparison between EGMM (fully primal-dual method) and ADMM (approximate dual method) is made over the multi-block optimization problems to illustrate the advantage of the EGMM.
△ Less
Submitted 16 March, 2023; v1 submitted 29 September, 2021;
originally announced September 2021.
-
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning
Authors:
Prashant Khanduri,
Pranay Sharma,
Haibo Yang,
Mingyi Hong,
Jia Liu,
Ketan Rajawat,
Pramod K. Varshney
Abstract:
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achiev…
▽ More
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(ε^{-3/2})$ samples and $\tilde{\mathcal{O}}(ε^{-1})$ communication rounds to compute an $ε$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.
△ Less
Submitted 19 June, 2021;
originally announced June 2021.
-
A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
Authors:
Prashant Khanduri,
Siliang Zeng,
Mingyi Hong,
Hoi-To Wai,
Zhaoran Wang,
Zhuoran Yang
Abstract:
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior w…
▽ More
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(ε^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $ε$-stationary solution. The $ε$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $ε$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.
△ Less
Submitted 15 June, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Decentralized Riemannian Gradient Descent on the Stiefel Manifold
Authors:
Shixiang Chen,
Alfredo Garcia,
Mingyi Hong,
Shahin Shahrampour
Abstract:
We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are pos…
▽ More
We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are possibly non-convex (but smooth) and the Steifel manifold is a non-convex set. We present a decentralized Riemannian stochastic gradient method (DRSGD) with the convergence rate of $\mathcal{O}(1/\sqrt{K})$ to a stationary point. To have exact convergence with constant stepsize, we also propose a decentralized Riemannian gradient tracking algorithm (DRGTA) with the convergence rate of $\mathcal{O}(1/K)$ to a stationary point. We use multi-step consensus to preserve the iteration in the local (consensus) region. DRGTA is the first decentralized algorithm with exact convergence for distributed optimization on Stiefel manifold.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
On the Local Linear Rate of Consensus on the Stiefel Manifold
Authors:
Shixiang Chen,
Alfredo Garcia,
Mingyi Hong,
Shahin Shahrampour
Abstract:
We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local…
▽ More
We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local linear convergence rate to global consensus. More importantly, this local rate asymptotically scales with the second largest singular value of the communication matrix, which is on par with the well-known rate in the Euclidean space. To the best of our knowledge, this is the first work showing the equality of the two rates. The main technical challenges include (i) develo** a Riemannian restricted secant inequality for convergence analysis, and (ii) to identify the conditions (e.g., suitable step-size and initialization) under which the algorithm always stays in the local region.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup
Authors:
Han Shen,
Kaiqing Zhang,
Mingyi Hong,
Tianyi Chen
Abstract:
Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, incl…
▽ More
Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, including its non-asymptotic analysis and the performance gain of parallelism (a.k.a. linear speedup). This paper revisits the A3C algorithm and establishes its non-asymptotic convergence guarantees. Under both i.i.d. and Markovian sampling, we establish the local convergence guarantee for A3C in the general policy approximation case and the global convergence guarantee in softmax policy parameterization. Under i.i.d. sampling, A3C obtains sample complexity of $\mathcal{O}(ε^{-2.5}/N)$ per worker to achieve $ε$ accuracy, where $N$ is the number of workers. Compared to the best-known sample complexity of $\mathcal{O}(ε^{-2.5})$ for two-timescale AC, A3C achieves \emph{linear speedup}, which justifies the advantage of parallelism and asynchrony in AC algorithms theoretically for the first time. Numerical tests on synthetic environment, OpenAI Gym environments and Atari games have been provided to verify our theoretical analysis.
△ Less
Submitted 16 March, 2022; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Derivatives of local times for some Gaussian fields II
Authors:
Minhao Hong,
Fangjun Xu
Abstract:
Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$.
Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
First-Order Algorithms Without Lipschitz Gradient: A Sequential Local Optimization Approach
Authors:
Junyu Zhang,
Mingyi Hong
Abstract:
First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of…
▽ More
First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of first-order algorithms that can effectively optimize problems without Lipschitz gradient. Operating on the assumption that the gradients are {\it locally} Lipschitz continuous over any compact set, the proposed framework carefully restricts the distance between two successive iterates. We show that the proposed framework can easily adapt to existing first-order methods such as gradient descent (GD), normalized gradient descent (NGD), accelerated gradient descent (AGD), as well as GD with Armijo line search. Remarkably, the latter algorithm is totally parameter-free and do not even require the knowledge of local Lipschitz constants.
We show that for the proposed algorithms to achieve gradient error bound of $\|\nabla f(x)\|^2\le ε$, it requires at most $\mathcal{O}(\frac{1}ε\times \mathcal{L}(Y))$ total access to the gradient oracle, where $\mathcal{L}(Y)$ characterizes how the local Lipschitz constants grow with the size of a given set $Y$. Moreover, we show that the variant of AGD improves the dependency on both $ε$ and the growth function $\mathcal{L}(Y)$. The proposed algorithms complement the existing Bregman Proximal Gradient (BPG) algorithm, because they do not require the global information about problem structure to construct and solve Bregman proximal map**s.
△ Less
Submitted 5 February, 2024; v1 submitted 7 October, 2020;
originally announced October 2020.
-
A new representation for the Landau-de Gennes energy of nematic liquid crystals
Authors:
Zhewen Feng,
Min-Chun Hong
Abstract:
In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to…
▽ More
In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. In order to solve this problem, we observe that the original third order term on $L_4$, proposed by Schiele and Trimper \cite{ST} in physics, is a linear combination of a fourth order term and a second order term. Therefore, we can propose a new Landau-de Gennes energy, which is equal to the original for uniaxial nematic $Q$-tensors. The new Landau-de Gennes energy with general elastic constants satisfies the coercivity condition for all $Q$-tensors, which establishes a new link between mathematical and physical theory. Similarly to the work of Majumdar-Zarnescu \cite{MZ}, we prove existence and convergence of minimizers of the new Landau-de Gennes energy. Moreover, we find a new way to study the limiting problem of the Landau-de Gennes system since the cross product method \cite{Chen} on the Ginzburg-Landau equation does not work for the Landau-de Gennes system.
△ Less
Submitted 6 January, 2021; v1 submitted 21 July, 2020;
originally announced July 2020.
-
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Authors:
Mingyi Hong,
Hoi-To Wai,
Zhaoran Wang,
Zhuoran Yang
Abstract:
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and stron…
▽ More
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.
△ Less
Submitted 8 June, 2022; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Understanding Gradient Clip** in Private SGD: A Geometric Perspective
Authors:
Xiangyi Chen,
Zhiwei Steven Wu,
Mingyi Hong
Abstract:
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of…
▽ More
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold. We first demonstrate how gradient clip** can prevent SGD from converging to stationary point. We then provide a theoretical analysis that fully quantifies the clip** bias on convergence with a disparity measure between the gradient distribution and a geometrically symmetric distribution. Our empirical evaluation further suggests that the gradient distributions along the trajectory of private SGD indeed exhibit symmetric structure that favors convergence. Together, our results provide an explanation why private SGD with gradient clip** remains effective in practice despite its potential clip** bias. Finally, we develop a new perturbation-based technique that can provably correct the clip** bias even for instances with highly asymmetric gradient distributions.
△ Less
Submitted 17 March, 2021; v1 submitted 27 June, 2020;
originally announced June 2020.
-
On the Divergence of Decentralized Non-Convex Optimization
Authors:
Mingyi Hong,
Siliang Zeng,
Junyu Zhang,
Haoran Sun
Abstract:
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examp…
▽ More
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied?
To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$'s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances
Authors:
Meisam Razaviyayn,
Tianjian Huang,
Songtao Lu,
Maher Nouiehed,
Maziar Sanjabi,
Mingyi Hong
Abstract:
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop…
▽ More
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very popular in a wide range of signal and data processing applications such as fair beamforming, training generative adversarial networks (GANs), and robust machine learning, to just name a few. The overarching goal of this article is to provide a survey of recent advances for an important subclass of min-max problem, where the minimization and maximization problems can be non-convex and/or non-concave. In particular, we will first present a number of applications to showcase the importance of such min-max problems; then we discuss key theoretical challenges, and provide a selective review of some exciting recent theoretical and algorithmic advances in tackling non-convex min-max problems. Finally, we will point out open questions and future research directions.
△ Less
Submitted 18 August, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
On Chen's biharmonic conjecture for hypersurfaces in $\mathbb R^5$
Authors:
Yu Fu,
Min-Chun Hong,
Xin Zhan
Abstract:
A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However…
▽ More
A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However, the general Chen's conjecture on a hypersurface $M^n$ remains open for $n> 3$. In this paper, we settle Chen's conjecture for hypersurfaces in $\mathbb R^{5}$ for $n=4$.
△ Less
Submitted 22 July, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
Generalization Bounds for Stochastic Saddle Point Problems
Authors:
Junyu Zhang,
Mingyi Hong,
Mengdi Wang,
Shuzhong Zhang
Abstract:
This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including…
▽ More
This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Online Proximal-ADMM For Time-varying Constrained Convex Optimization
Authors:
Yijian Zhang,
Emiliano Dall'Anese,
Mingyi Hong
Abstract:
This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of…
▽ More
This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of the time-varying problem; in particular, the proposed algorithm consists of a primal proximal gradient descent step and an appropriately perturbed dual ascent step. The paper derives tracking results, asymptotic bounds, and linear convergence results. The proposed algorithm is then specialized to a multi-area power grid optimization problem, and our numerical results verify the desired properties.
△ Less
Submitted 12 January, 2021; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond
Authors:
Tsung-Hui Chang,
Mingyi Hong,
Hoi-To Wai,
Xinwei Zhang,
Songtao Lu
Abstract:
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In par…
▽ More
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
On Lower Iteration Complexity Bounds for the Saddle Point Problems
Authors:
Junyu Zhang,
Mingyi Hong,
Shuzhong Zhang
Abstract:
In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of…
▽ More
In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of strongly monotone variational inequality problems, which corresponds to the case where the gradient Lipschitz constants ($L_x, L_y$ and $L_{xy}$) and strong convexity/concavity constants ($μ_x$ and $μ_y$) are uniform with respect to variables $x$ and $y$. However, specific to the min-max saddle point problem these parameters are naturally different. Therefore, one is led to finding the best possible lower iteration complexity bounds, specific to the min-max saddle point models. In this paper we present the following results. For the class of pure first-order algorithms, our lower iteration complexity bound is $Ω\left(\sqrt{\frac{L_x}{μ_x}+\frac{L_{xy}^2}{μ_xμ_y}+\frac{L_y}{μ_y}}\cdot\ln\left(\frac{1}ε\right)\right)$, where the term $\frac{L_{xy}^2}{μ_xμ_y}$ explains how the coupling influences the iteration complexity. Under several special parameter regimes, this lower bound has been achieved by corresponding optimal algorithms. However, whether or not the bound under the general parameter regime is optimal remains open. Additionally, for the special case of bilinear coupling problems, given the availability of certain proximal operators, a lower bound of $Ω\left(\sqrt{\frac{L_{xy}^2}{μ_xμ_y}+1}\cdot\ln(\frac{1}ε)\right)$ is established in this paper, and optimal algorithms have already been developed in the literature.
△ Less
Submitted 20 June, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization
Authors:
Xiangyi Chen,
Sijia Liu,
Kaidi Xu,
Xingguo Li,
Xue Lin,
Mingyi Hong,
David Cox
Abstract:
The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we prop…
▽ More
The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization. Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods.
△ Less
Submitted 15 October, 2019; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach
Authors:
Haoran Sun,
Songtao Lu,
Mingyi Hong
Abstract:
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentra…
▽ More
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $ε$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}ε^{-1})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mnε^{-1})$ and $\mathcal{O}(ε^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m ε^{-3/2})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(mε^{-2})$ and $\mathcal{O}(ε^{-2})$, respectively.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Authors:
Sijia Liu,
Songtao Lu,
Xiangyi Chen,
Yao Feng,
Kaidi Xu,
Abdullah Al-Dujaili,
Minyi Hong,
Una-May O'Reilly
Abstract:
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former…
▽ More
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.
△ Less
Submitted 16 June, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Authors:
Zhuoran Yang,
Yongxin Chen,
Mingyi Hong,
Zhaoran Wang
Abstract:
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setti…
▽ More
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
△ Less
Submitted 14 July, 2019;
originally announced July 2019.
-
SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems
Authors:
Songtao Lu,
Meisam Razaviyayn,
Bo Yang,
Kejun Huang,
Mingyi Hong
Abstract:
This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementa…
▽ More
This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/ε^{2.5})$ iterations to compute an $(ε, \sqrtε)$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Authors:
Xiangyi Chen,
Tiancong Chen,
Haoran Sun,
Zhiwei Steven Wu,
Mingyi Hong
Abstract:
Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses…
▽ More
Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.
△ Less
Submitted 6 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Derivatives of local times for some Gaussian fields
Authors:
Minhao Hong,
Fangjun Xu
Abstract:
In this article, we consider derivatives of local time for a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\widetilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\widetilde{X}^{H_2}$ are two independent processes from a class of $d$-dimensional centered Gaussian processes satisfying certain local nondeterminism property. We first give a condition for existence of derivatives of th…
▽ More
In this article, we consider derivatives of local time for a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\widetilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\widetilde{X}^{H_2}$ are two independent processes from a class of $d$-dimensional centered Gaussian processes satisfying certain local nondeterminism property. We first give a condition for existence of derivatives of the local time. Then, under this condition, we show that derivatives of the local time are Hölder continuous in both time and space variables. Moreover, under some additional assumptions, we show that this condition is also necessary for existence of derivatives of the local time at the origin.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications
Authors:
Songtao Lu,
Ioannis Tsaknakis,
Mingyi Hong,
Yongxin Chen
Abstract:
The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain sp…
▽ More
The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain special convex-concave structure. Therefore, it cannot be used to guide the algorithm design for many interesting problems in SPCOM, where various kinds of non-convexity arise.
In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave. We propose a class of simple algorithms named Hybrid Block Successive Approximation (HiBSA), which alternatingly perform gradient descent-type steps for the minimization blocks and gradient ascent-type steps for the maximization problem. A key element in the proposed algorithm is the use of certain regularization and penalty sequences, which stabilize the algorithm and ensure convergence. We show that HiBSA converges to some properly defined first-order stationary solutions with quantifiable global rates. To validate the efficiency of the proposed algorithms, we conduct numerical tests on a number of problems, including the robust learning problem, the non-convex min-utility maximization problems, and certain wireless jamming problem arising in interfering channels.
△ Less
Submitted 16 March, 2021; v1 submitted 21 February, 2019;
originally announced February 2019.
-
On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator
Authors:
Qi Cai,
Mingyi Hong,
Yongxin Chen,
Zhaoran Wang
Abstract:
We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization. To address the challenges arising from non-convex-concave geometry, we analyze the alternating gradient algorithm and establish its Q-linear rate of convergence to a unique saddle point, which simultaneously recovers the globally optimal policy and r…
▽ More
We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization. To address the challenges arising from non-convex-concave geometry, we analyze the alternating gradient algorithm and establish its Q-linear rate of convergence to a unique saddle point, which simultaneously recovers the globally optimal policy and reward function. We hope our results may serve as a small step towards understanding and taming the instability in imitation learning as well as in more general non-convex-concave alternating minimax optimization that arises from reinforcement learning and generative adversarial learning.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Coordinating Multiple Sources for Service Restoration to Enhance Resilience of Distribution Systems
Authors:
Ying Wang,
Yin Xu,
**ghan He,
Chen-Ching Liu,
Kevin P. Schneider,
Mingguo Hong,
Dan T. Ton
Abstract:
When a major outage occurs on a distribution system due to extreme events, microgrids, distributed generators, and other local resources can be used to restore critical loads and enhance resiliency. This paper proposes a decision-making method to determine the optimal restoration strategy coordinating multiple sources to serve critical loads after blackouts. The critical load restoration problem i…
▽ More
When a major outage occurs on a distribution system due to extreme events, microgrids, distributed generators, and other local resources can be used to restore critical loads and enhance resiliency. This paper proposes a decision-making method to determine the optimal restoration strategy coordinating multiple sources to serve critical loads after blackouts. The critical load restoration problem is solved by a two-stage method with the first stage deciding the post-restoration topology and the second stage determining the set of loads to be restored and the outputs of sources. In the second stage, the problem is formulated as a mixed-integer semidefinite program. The objective is maximizing the number of loads restored, weighted by their priority. The unbalanced three-phase power flow constraint and operational constraints are considered. An iterative algorithm is proposed to deal with integer variables and can attain the global optimum of the critical load restoration problem by solving a few semidefinite programs under two conditions. The effectiveness of the proposed method is validated by numerical simulation with the modified IEEE 13-node test feeder and the modified IEEE 123-node test feeder under plenty of scenarios. The results indicate that the optimal restoration strategy can be determined efficiently in most scenarios.
△ Less
Submitted 15 January, 2019; v1 submitted 16 October, 2018;
originally announced October 2018.
-
A Linearly Convergent Doubly Stochastic Gauss-Seidel Algorithm for Solving Linear Equations and A Certain Class of Over-Parameterized Optimization Problems
Authors:
Meisam Razaviyayn,
Mingyi Hong,
Navid Reyhanian,
Zhi-Quan Luo
Abstract:
Consider the classical problem of solving a general linear system of equations $Ax=b$. It is well known that the (successively over relaxed) Gauss-Seidel scheme and many of its variants may not converge when $A$ is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G-S type algorithm that works for {\it any} $A$? In this paper we answer this question aff…
▽ More
Consider the classical problem of solving a general linear system of equations $Ax=b$. It is well known that the (successively over relaxed) Gauss-Seidel scheme and many of its variants may not converge when $A$ is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G-S type algorithm that works for {\it any} $A$? In this paper we answer this question affirmatively by proposing a doubly stochastic G-S algorithm that is provably linearly convergent (in the mean square error sense) for any feasible linear system of equations. The key in the algorithm design is to introduce a {\it nonuniform double stochastic} scheme for picking the equation and the variable in each update step as well as a stepsize rule. These techniques also generalize to certain iterative alternating projection algorithms for solving the linear feasibility problem $A x\le b$ with an arbitrary $A$, as well as high-dimensional minimization problems for training over-parameterized models in machine learning. Our results demonstrate that a carefully designed randomization scheme can make an otherwise divergent G-S algorithm converge.
△ Less
Submitted 13 May, 2019; v1 submitted 11 October, 2018;
originally announced October 2018.
-
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Authors:
Xiangyi Chen,
Sijia Liu,
Ruoyu Sun,
Mingyi Hong
Abstract:
This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconve…
▽ More
This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconvex problems remains an open question. This paper provides a set of mild sufficient conditions that guarantee the convergence for the Adam-type methods. We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization. We show the conditions are essential in the sense that violating them may make the algorithm diverge. Moreover, we propose and analyze a class of (deterministic) incremental adaptive gradient algorithms, which has the same $O(\log{T}/\sqrt{T})$ convergence rate. Our study could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.
△ Less
Submitted 9 March, 2019; v1 submitted 8 August, 2018;
originally announced August 2018.
-
Price-Based Market Clearing with V2G Integration Using Generalized Benders Decomposition
Authors:
Reza Jamalzadeh,
Sajjad Abedi,
Masoud Rashidinejad,
Mingguo Hong
Abstract:
Currently, most ISOs adopt offer cost minimization (OCM) auction mechanism which minimizes the total offer cost, and then, a settlement rule based on either locational marginal prices (LMPs) or market clearing price (MCP) is used to determine the payments to the committed units, which is not compatible with the auction mechanism because the minimized cost is different from the payment cost calcula…
▽ More
Currently, most ISOs adopt offer cost minimization (OCM) auction mechanism which minimizes the total offer cost, and then, a settlement rule based on either locational marginal prices (LMPs) or market clearing price (MCP) is used to determine the payments to the committed units, which is not compatible with the auction mechanism because the minimized cost is different from the payment cost calculated by the settlement rule. This inconsistency can drastically increase the payment cost. On the other hand, payment cost minimization (PCM) auction mechanism eliminates this inconsistency; however, PCM problem is a nonlinear self-referring NP-hard problem which poses grand computational burden. In this paper, a mixed-integer nonlinear programing (MINLP) formulation of PCM problem are presented to address additional complexity of fast-growing penetration of Vehicle-to-Grid (V2G) in the price-based market clearing problem, and a solution method based on the generalized benders decomposition (GBD) is then proposed to solve the V2G-integrated PCM problem, and its favorable performance in terms of convergence and computational efficiency is demonstrated using case studies. The proposed GBD-based method can handle scaled-up models with the increased number of decision variables and constraints which facilitates the use of PCM mechanism in the market clearing of large-scale power systems. The impact of using V2G technologies on the OCM and PCM mechanisms in terms of MCPs and payments is also investigated, and by using numerical results, the performances of these two mechanisms are compared.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Authors:
Hoi-To Wai,
Zhuoran Yang,
Zhaoran Wang,
Mingyi Hong
Abstract:
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn…
▽ More
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.
△ Less
Submitted 8 January, 2019; v1 submitted 3 June, 2018;
originally announced June 2018.
-
Convergence of the Ginzburg-Landau approximation for the Ericksen-Leslie system
Authors:
Zhewen Feng,
Min-Chun Hong,
Yu Mei
Abstract:
We establish the local well-posedness of the general Ericksen-Leslie system in liquid crystals with the initial velocity and director field in $H^1 \times H_b^2$. In particular, we prove that the solutions of the Ginzburg-Landau approximation system converge smoothly to the solution of the Ericksen-Leslie system for any $t \in (0,T^\ast)$ with a maximal existence time $T^\ast$ of the Ericksen- Les…
▽ More
We establish the local well-posedness of the general Ericksen-Leslie system in liquid crystals with the initial velocity and director field in $H^1 \times H_b^2$. In particular, we prove that the solutions of the Ginzburg-Landau approximation system converge smoothly to the solution of the Ericksen-Leslie system for any $t \in (0,T^\ast)$ with a maximal existence time $T^\ast$ of the Ericksen- Leslie system.
△ Less
Submitted 13 June, 2019; v1 submitted 22 April, 2018;
originally announced April 2018.