Search | arXiv e-print repository

EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Authors: Chung-Yiu Yau, Hoi-To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

Abstract: A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the… ▽ More A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 20 pages

arXiv:2404.09800 [pdf, ps, other]

Fractional derivatives of local times for some Gaussian processes

Authors: Minhao Hong, Qian Yu

Abstract: In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variabl… ▽ More In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variables and are also continuous with respect to the order of derivatives. Moreover, under some additional assumptions, we show that this condition is also necessary for existence of derivatives of the local time with the help of contour integration. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2402.08821 [pdf, other]

Problem-Parameter-Free Decentralized Nonconvex Stochastic Optimization

Authors: Jiaxiang Li, Xuxing Chen, Shiqian Ma, Mingyi Hong

Abstract: Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm… ▽ More Existing decentralized algorithms usually require knowledge of problem parameters for updating local iterates. For example, the hyperparameters (such as learning rate) usually require the knowledge of Lipschitz constant of the global gradient or topological information of the communication networks, which are usually not accessible in practice. In this paper, we propose D-NASA, the first algorithm for decentralized nonconvex stochastic optimization that requires no prior knowledge of any problem parameters. We show that D-NASA has the optimal rate of convergence for nonconvex objectives under very mild conditions and enjoys the linear-speedup effect, i.e. the computation becomes faster as the number of nodes in the system increases. Extensive numerical experiments are conducted to support our findings. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.12025 [pdf, other]

A Survey of Recent Advances in Optimization Methods for Wireless Communications

Authors: Ya-Feng Liu, Tsung-Hui Chang, Mingyi Hong, Zheyu Wu, Anthony Man-Cho So, Eduard A. Jorswieck, Wei Yu

Abstract: Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the n… ▽ More Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recently developed optimization techniques in areas ranging from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing or develo** suitable algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions. △ Less

Submitted 7 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 39 pages, 5 figures, accepted for publication in IEEE Journal on Selected Areas in Communications

arXiv:2401.11380 [pdf, other]

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning

Authors: Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu

Abstract: Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully l… ▽ More Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. Despite these advancements, existing model-based offline RL approaches either focus on theoretical studies without develo** practical algorithms or rely on a restricted parametric policy space, thus not fully leveraging the advantages of an unrestricted policy space inherent to model-based methods. To address this limitation, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself from existing literature by employing an unrestricted policy class. In each iteration, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of transition models in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of MoMA by proving an upper bound on the suboptimality of the returned policy. We also provide a practically implementable, approximate version of the algorithm. The effectiveness of MoMA is demonstrated via numerical studies. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.08893 [pdf, other]

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

Abstract: Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra… ▽ More Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers. △ Less

Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.03058 [pdf, other]

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

Authors: Ruichen Jiang, Parameswaran Raman, Shoham Sabach, Aryan Mokhtari, Mingyi Hong, Volkan Cevher

Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.… ▽ More Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 27 pages, 2 figures

arXiv:2308.00788 [pdf, other]

An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning

Authors: Yihua Zhang, Prashant Khanduri, Ioannis Tsaknakis, Yuguang Yao, Mingyi Hong, Sijia Liu

Abstract: Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become… ▽ More Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications. △ Less

Submitted 20 December, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

arXiv:2305.17083 [pdf, other]

A Policy Gradient Method for Confounded POMDPs

Authors: Mao Hong, Zhengling Qi, Yanxun Xu

Abstract: In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of c… ▽ More In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting. △ Less

Submitted 30 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 95 pages, 3 figures

arXiv:2305.13146 [pdf, ps, other]

Limit theorems for additive functionals of some self-similar Gaussian processes

Authors: Minhao Hong, Heguang Liu, Fangjun Xu

Abstract: Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gau… ▽ More Under certain mild conditions, limit theorems for additive functionals of some $d$-dimensional self-similar Gaussian processes are obtained. These limit theorems work for general Gaussian processes including fractional Brownian motions, sub-fractional Brownian motions and bi-fractional Brownian motions. To prove these results, we use the method of moments and an enhanced chaining argument. The Gaussian processes under consideration are required to satisfy certain strong local nondeterminism property. A tractable sufficient condition for the strong local nondeterminism property is given and it only relays on the covariance functions of the Gaussian processes. Moreover, we give a sufficient condition for the distribution function of a random vector to be determined by its moments. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2302.10157 [pdf, other]

The Game of Life on the Robinson Triangle Penrose Tiling: Still Life

Authors: Seung Hyeon Mandy Hong, May Mei

Abstract: We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes. We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes. △ Less

Submitted 10 April, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2210.16152 [pdf, ps, other]

Limit laws for functionals of self-intersection symmetric alpha-stable processes

Authors: Minhao Hong, Qian Yu

Abstract: In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed. In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 18 pages

arXiv:2205.12433 [pdf, ps, other]

Global Classical Solutions Near Vacuum to the Initial-Boundary Value Problem of Isentropic Supersonic Flows through Divergent Ducts

Authors: Ying-Chieh Lin, Jay Chu, John M. Hong, Hsin-Yi Lee

Abstract: In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source.… ▽ More In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source. We provide a new result for the global existence of classical solutions to initial-boundary value problems of non-dissipative hyperbolic balance laws without the assumption of small data. The work is based on the local existence, the maximum principle and the uniform a priori estimates obtained by the generalized Lax transformations. The asymptotic behavior of classical solutions is also shown by studying the behavior of Riemann invariants along each characteristic curve and vertical line. The results can be applied to the spherically symmetric solutions to N-dimensional compressible Euler equations. Numerical simulations are provided to support our theoretical results. △ Less

Submitted 24 May, 2022; originally announced May 2022.

MSC Class: 35L45; 35L65; 35L67; 35L81

arXiv:2112.04074 [pdf, ps, other]

Existence and convergence of the Beris-Edwards system with general Landau-de Gennes energy

Authors: Zhewen Feng, Min-Chun Hong, Yu Mei

Abstract: In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-t… ▽ More In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-tensors converge smoothly to the solution of the Beris-Edwards system for uniaxial $Q$-tensors up to its maximal existence time. △ Less

Submitted 10 November, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

MSC Class: 35K51; 35Q30; 76A15

arXiv:2112.03453 [pdf, ps, other]

Existence of minimizers and convergence of critical points for a new Landau-de Gennes energy functional in nematic liquid crystals

Authors: Zhewen Feng, Min-Chun Hong

Abstract: The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers.… ▽ More The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. At first, we introduce a new Landau-de Gennes energy density with $L_4\neq 0$, which is equivalent to the original Landau-de Gennes density for uniaxial tensors and satisfies the coercivity condition for all $Q$-tensors. Secondly, we prove that solutions of the Landau-de Gennes system can approach a solution of the $Q$-tensor Oseen-Frank system without using energy minimizers. Thirdly, we develop a new approach to generalize the Nguyen and Zarnescu (Calc. Var. PDEs, 2013) convergence result to the case of non-zero elastic constants $L_2$, $L_3$, $L_4$. △ Less

Submitted 28 September, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: Revised vision. To appear in Calculus of Variations and Partial Differential Equations. arXiv admin note: text overlap with arXiv:2007.11144

MSC Class: 35J20; 35Q35; 76A15

arXiv:2110.11210 [pdf, other]

Minimax Problems with Coupled Linear Constraints: Computational Complexity, Duality and Solution Methods

Authors: Ioannis Tsaknakis, Mingyi Hong, Shuzhong Zhang

Abstract: In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the consider… ▽ More In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the considered problem is challenging, in the sense that it violates the classical max-min inequality, and that it is NP-hard even under very strong assumptions (e.g., when the objective is strongly convex-strongly concave). We then develop a duality theory for it, and analyze conditions under which the duality gap becomes zero. Finally, we study a class of stationary solutions defined based on the dual problem, and evaluate their practical performance in an application on adversarial attacks on network flow problems. △ Less

Submitted 25 November, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2110.03438 [pdf, ps, other]

Biconservative hypersurfaces with constant scalar curvature in space forms

Authors: Yu Fu, Min-Chun Hong, Dan Yang, Xin Zhan

Abstract: Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature… ▽ More Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature in $N^4(c)$ has constant mean curvature. Moreover, we prove that any biconservative hypersurface with constant scalar curvature in $N^5(c)$ is ether an open part of a certain rotational hypersurface or a constant mean curvature hypersurface. These solve an open problem proposed recently by D. Fetcu and C. Oniciuc for $n\leq4$. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 20pages

arXiv:2109.14212 [pdf, other]

Primal-Dual First-Order Methods for Affinely Constrained Multi-Block Saddle Point Problems

Authors: Junyu Zhang, Mengdi Wang, Mingyi Hong, Shuzhong Zhang

Abstract: We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under th… ▽ More We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under the topics of ADMM, this minimax problem is rarely investigated. In this paper, a convenient notion of $ε$-saddle point is proposed, under which the convergence rate of several proposed algorithms are analyzed. When only one of $\mathbf{x}$ and $\mathbf{y}$ has multiple blocks and affine constraint, several natural extensions of ADMM are proposed to solve the problem. Depending on the number of blocks and the level of smoothness, $\mathcal{O}(1/T)$ or $\mathcal{O}(1/\sqrt{T})$ convergence rates are derived for our algorithms. When both $\mathbf{x}$ and $\mathbf{y}$ have multiple blocks and affine constraints, a new algorithm called ExtraGradient Method of Multipliers (EGMM) is proposed. Under desirable smoothness condition, an $\mathcal{O}(1/T)$ rate of convergence can be guaranteed regardless of the number of blocks in $\mathbf{x}$ and $\mathbf{y}$. In depth comparison between EGMM (fully primal-dual method) and ADMM (approximate dual method) is made over the multi-block optimization problems to illustrate the advantage of the EGMM. △ Less

Submitted 16 March, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 25 pages

arXiv:2106.10435 [pdf, other]

STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Authors: Prashant Khanduri, Pranay Sharma, Haibo Yang, Mingyi Hong, Jia Liu, Ketan Rajawat, Pramod K. Varshney

Abstract: Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achiev… ▽ More Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(ε^{-3/2})$ samples and $\tilde{\mathcal{O}}(ε^{-1})$ communication rounds to compute an $ε$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance. △ Less

Submitted 19 June, 2021; originally announced June 2021.

arXiv:2102.07367 [pdf, other]

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Authors: Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Abstract: This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior w… ▽ More This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(ε^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $ε$-stationary solution. The $ε$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $ε$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex. △ Less

Submitted 15 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: 36 Pages, 10 Figures

arXiv:2102.07091 [pdf, other]

Decentralized Riemannian Gradient Descent on the Stiefel Manifold

Authors: Shixiang Chen, Alfredo Garcia, Mingyi Hong, Shahin Shahrampour

Abstract: We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are pos… ▽ More We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are possibly non-convex (but smooth) and the Steifel manifold is a non-convex set. We present a decentralized Riemannian stochastic gradient method (DRSGD) with the convergence rate of $\mathcal{O}(1/\sqrt{K})$ to a stationary point. To have exact convergence with constant stepsize, we also propose a decentralized Riemannian gradient tracking algorithm (DRGTA) with the convergence rate of $\mathcal{O}(1/K)$ to a stationary point. We use multi-step consensus to preserve the iteration in the local (consensus) region. DRGTA is the first decentralized algorithm with exact convergence for distributed optimization on Stiefel manifold. △ Less

Submitted 14 February, 2021; originally announced February 2021.

arXiv:2101.09346 [pdf, ps, other]

On the Local Linear Rate of Consensus on the Stiefel Manifold

Authors: Shixiang Chen, Alfredo Garcia, Mingyi Hong, Shahin Shahrampour

Abstract: We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local… ▽ More We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local linear convergence rate to global consensus. More importantly, this local rate asymptotically scales with the second largest singular value of the communication matrix, which is on par with the well-known rate in the Euclidean space. To the best of our knowledge, this is the first work showing the equality of the two rates. The main technical challenges include (i) develo** a Riemannian restricted secant inequality for convergence analysis, and (ii) to identify the conditions (e.g., suitable step-size and initialization) under which the algorithm always stays in the local region. △ Less

Submitted 22 January, 2021; originally announced January 2021.

arXiv:2012.15511 [pdf, other]

doi 10.1109/TSP.2023.3268475

Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

Authors: Han Shen, Kaiqing Zhang, Mingyi Hong, Tianyi Chen

Abstract: Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, incl… ▽ More Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, including its non-asymptotic analysis and the performance gain of parallelism (a.k.a. linear speedup). This paper revisits the A3C algorithm and establishes its non-asymptotic convergence guarantees. Under both i.i.d. and Markovian sampling, we establish the local convergence guarantee for A3C in the general policy approximation case and the global convergence guarantee in softmax policy parameterization. Under i.i.d. sampling, A3C obtains sample complexity of $\mathcal{O}(ε^{-2.5}/N)$ per worker to achieve $ε$ accuracy, where $N$ is the number of workers. Compared to the best-known sample complexity of $\mathcal{O}(ε^{-2.5})$ for two-timescale AC, A3C achieves \emph{linear speedup}, which justifies the advantage of parallelism and asynchrony in AC algorithms theoretically for the first time. Numerical tests on synthetic environment, OpenAI Gym environments and Atari games have been provided to verify our theoretical analysis. △ Less

Submitted 16 March, 2022; v1 submitted 31 December, 2020; originally announced December 2020.

arXiv:2010.11626 [pdf, ps, other]

Derivatives of local times for some Gaussian fields II

Authors: Minhao Hong, Fangjun Xu

Abstract: Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$. Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$. △ Less

Submitted 22 October, 2020; originally announced October 2020.

arXiv:2010.03194 [pdf, ps, other]

First-Order Algorithms Without Lipschitz Gradient: A Sequential Local Optimization Approach

Authors: Junyu Zhang, Mingyi Hong

Abstract: First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of… ▽ More First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of first-order algorithms that can effectively optimize problems without Lipschitz gradient. Operating on the assumption that the gradients are {\it locally} Lipschitz continuous over any compact set, the proposed framework carefully restricts the distance between two successive iterates. We show that the proposed framework can easily adapt to existing first-order methods such as gradient descent (GD), normalized gradient descent (NGD), accelerated gradient descent (AGD), as well as GD with Armijo line search. Remarkably, the latter algorithm is totally parameter-free and do not even require the knowledge of local Lipschitz constants. We show that for the proposed algorithms to achieve gradient error bound of $\|\nabla f(x)\|^2\le ε$, it requires at most $\mathcal{O}(\frac{1}ε\times \mathcal{L}(Y))$ total access to the gradient oracle, where $\mathcal{L}(Y)$ characterizes how the local Lipschitz constants grow with the size of a given set $Y$. Moreover, we show that the variant of AGD improves the dependency on both $ε$ and the growth function $\mathcal{L}(Y)$. The proposed algorithms complement the existing Bregman Proximal Gradient (BPG) algorithm, because they do not require the global information about problem structure to construct and solve Bregman proximal map**s. △ Less

Submitted 5 February, 2024; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: Accepted by Informs Journal on Optimization

arXiv:2007.11144 [pdf, ps, other]

A new representation for the Landau-de Gennes energy of nematic liquid crystals

Authors: Zhewen Feng, Min-Chun Hong

Abstract: In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to… ▽ More In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. In order to solve this problem, we observe that the original third order term on $L_4$, proposed by Schiele and Trimper \cite{ST} in physics, is a linear combination of a fourth order term and a second order term. Therefore, we can propose a new Landau-de Gennes energy, which is equal to the original for uniaxial nematic $Q$-tensors. The new Landau-de Gennes energy with general elastic constants satisfies the coercivity condition for all $Q$-tensors, which establishes a new link between mathematical and physical theory. Similarly to the work of Majumdar-Zarnescu \cite{MZ}, we prove existence and convergence of minimizers of the new Landau-de Gennes energy. Moreover, we find a new way to study the limiting problem of the Landau-de Gennes system since the cross product method \cite{Chen} on the Ginzburg-Landau equation does not work for the Landau-de Gennes system. △ Less

Submitted 6 January, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: corrects several typos

MSC Class: 35J20; 35Q35; 76A15

arXiv:2007.05170 [pdf, other]

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Authors: Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Abstract: This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and stron… ▽ More This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy. △ Less

Submitted 8 June, 2022; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: Minor revision

arXiv:2006.15429 [pdf, other]

Understanding Gradient Clip** in Private SGD: A Geometric Perspective

Authors: Xiangyi Chen, Zhiwei Steven Wu, Mingyi Hong

Abstract: Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of… ▽ More Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold. We first demonstrate how gradient clip** can prevent SGD from converging to stationary point. We then provide a theoretical analysis that fully quantifies the clip** bias on convergence with a disparity measure between the gradient distribution and a geometrically symmetric distribution. Our empirical evaluation further suggests that the gradient distributions along the trajectory of private SGD indeed exhibit symmetric structure that favors convergence. Together, our results provide an explanation why private SGD with gradient clip** remains effective in practice despite its potential clip** bias. Finally, we develop a new perturbation-based technique that can provably correct the clip** bias even for instances with highly asymmetric gradient distributions. △ Less

Submitted 17 March, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

arXiv:2006.11662 [pdf, ps, other]

On the Divergence of Decentralized Non-Convex Optimization

Authors: Mingyi Hong, Siliang Zeng, Junyu Zhang, Haoran Sun

Abstract: We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examp… ▽ More We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied? To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$'s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: 34 pages

arXiv:2006.08141 [pdf, other]

doi 10.1109/MSP.2020.3003851

Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances

Authors: Meisam Razaviyayn, Tianjian Huang, Songtao Lu, Maher Nouiehed, Maziar Sanjabi, Mingyi Hong

Abstract: The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop… ▽ More The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very popular in a wide range of signal and data processing applications such as fair beamforming, training generative adversarial networks (GANs), and robust machine learning, to just name a few. The overarching goal of this article is to provide a survey of recent advances for an important subclass of min-max problem, where the minimization and maximization problems can be non-convex and/or non-concave. In particular, we will first present a number of applications to showcase the importance of such min-max problems; then we discuss key theoretical challenges, and provide a selective review of some exciting recent theoretical and algorithmic advances in tackling non-convex min-max problems. Finally, we will point out open questions and future research directions. △ Less

Submitted 18 August, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Journal ref: IEEE Signal Processing Magazine (Volume: 37, Issue: 5, Sept. 2020)

arXiv:2006.07612 [pdf, ps, other]

On Chen's biharmonic conjecture for hypersurfaces in $\mathbb R^5$

Authors: Yu Fu, Min-Chun Hong, Xin Zhan

Abstract: A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However… ▽ More A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However, the general Chen's conjecture on a hypersurface $M^n$ remains open for $n> 3$. In this paper, we settle Chen's conjecture for hypersurfaces in $\mathbb R^{5}$ for $n=4$. △ Less

Submitted 22 July, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: 23 pages

arXiv:2006.02067 [pdf, ps, other]

Generalization Bounds for Stochastic Saddle Point Problems

Authors: Junyu Zhang, Mingyi Hong, Mengdi Wang, Shuzhong Zhang

Abstract: This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including… ▽ More This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.03267 [pdf, other]

Online Proximal-ADMM For Time-varying Constrained Convex Optimization

Authors: Yijian Zhang, Emiliano Dall'Anese, Mingyi Hong

Abstract: This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of… ▽ More This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of the time-varying problem; in particular, the proposed algorithm consists of a primal proximal gradient descent step and an appropriately perturbed dual ascent step. The paper derives tracking results, asymptotic bounds, and linear convergence results. The proposed algorithm is then specialized to a multi-area power grid optimization problem, and our numerical results verify the desired properties. △ Less

Submitted 12 January, 2021; v1 submitted 7 May, 2020; originally announced May 2020.

arXiv:2001.04786 [pdf, other]

doi 10.1109/MSP.2020.2970170

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Authors: Tsung-Hui Chang, Mingyi Hong, Hoi-To Wai, Xinwei Zhang, Songtao Lu

Abstract: Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In par… ▽ More Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: Submitted to IEEE Signal Processing Magazine Special Issue on Distributed, Streaming Machine Learning; THC, MH, HTW contributed equally

arXiv:1912.07481 [pdf, ps, other]

doi 10.1007/s10107-021-01660-z

On Lower Iteration Complexity Bounds for the Saddle Point Problems

Authors: Junyu Zhang, Mingyi Hong, Shuzhong Zhang

Abstract: In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of… ▽ More In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of strongly monotone variational inequality problems, which corresponds to the case where the gradient Lipschitz constants ($L_x, L_y$ and $L_{xy}$) and strong convexity/concavity constants ($μ_x$ and $μ_y$) are uniform with respect to variables $x$ and $y$. However, specific to the min-max saddle point problem these parameters are naturally different. Therefore, one is led to finding the best possible lower iteration complexity bounds, specific to the min-max saddle point models. In this paper we present the following results. For the class of pure first-order algorithms, our lower iteration complexity bound is $Ω\left(\sqrt{\frac{L_x}{μ_x}+\frac{L_{xy}^2}{μ_xμ_y}+\frac{L_y}{μ_y}}\cdot\ln\left(\frac{1}ε\right)\right)$, where the term $\frac{L_{xy}^2}{μ_xμ_y}$ explains how the coupling influences the iteration complexity. Under several special parameter regimes, this lower bound has been achieved by corresponding optimal algorithms. However, whether or not the bound under the general parameter regime is optimal remains open. Additionally, for the special case of bilinear coupling problems, given the availability of certain proximal operators, a lower bound of $Ω\left(\sqrt{\frac{L_{xy}^2}{μ_xμ_y}+1}\cdot\ln(\frac{1}ε)\right)$ is established in this paper, and optimal algorithms have already been developed in the literature. △ Less

Submitted 20 June, 2021; v1 submitted 16 December, 2019; originally announced December 2019.

Journal ref: Mathematical Programming (2021)

arXiv:1910.06513 [pdf, other]

ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

Authors: Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox

Abstract: The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we prop… ▽ More The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization. Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods. △ Less

Submitted 15 October, 2019; v1 submitted 14 October, 2019; originally announced October 2019.

arXiv:1910.05857 [pdf, ps, other]

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach

Authors: Haoran Sun, Songtao Lu, Mingyi Hong

Abstract: Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentra… ▽ More Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $ε$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}ε^{-1})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mnε^{-1})$ and $\mathcal{O}(ε^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m ε^{-3/2})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(mε^{-2})$ and $\mathcal{O}(ε^{-2})$, respectively. △ Less

Submitted 13 October, 2019; originally announced October 2019.

Journal ref: Published at the International Conference on Machine Learning (ICML 2020)

arXiv:1909.13806 [pdf, other]

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

Authors: Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May O'Reilly

Abstract: In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former… ▽ More In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers. △ Less

Submitted 16 June, 2020; v1 submitted 30 September, 2019; originally announced September 2019.

Comments: ICML 2020

arXiv:1907.06246 [pdf, ps, other]

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Authors: Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

Abstract: Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setti… ▽ More Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics. △ Less

Submitted 14 July, 2019; originally announced July 2019.

Comments: 41 pages

arXiv:1907.04450 [pdf, ps, other]

SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

Authors: Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

Abstract: This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementa… ▽ More This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/ε^{2.5})$ iterations to compute an $(ε, \sqrtε)$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints. △ Less

Submitted 9 July, 2019; originally announced July 2019.

arXiv:1906.01736 [pdf, other]

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Zhiwei Steven Wu, Mingyi Hong

Abstract: Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses… ▽ More Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator. △ Less

Submitted 6 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.09631 [pdf, ps, other]

Derivatives of local times for some Gaussian fields

Authors: Minhao Hong, Fangjun Xu

Abstract: In this article, we consider derivatives of local time for a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\widetilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\widetilde{X}^{H_2}$ are two independent processes from a class of $d$-dimensional centered Gaussian processes satisfying certain local nondeterminism property. We first give a condition for existence of derivatives of th… ▽ More In this article, we consider derivatives of local time for a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\widetilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\widetilde{X}^{H_2}$ are two independent processes from a class of $d$-dimensional centered Gaussian processes satisfying certain local nondeterminism property. We first give a condition for existence of derivatives of the local time. Then, under this condition, we show that derivatives of the local time are Hölder continuous in both time and space variables. Moreover, under some additional assumptions, we show that this condition is also necessary for existence of derivatives of the local time at the origin. △ Less

Submitted 23 May, 2019; originally announced May 2019.

arXiv:1902.08294 [pdf, ps, other]

doi 10.1109/TSP.2020.2986363

Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications

Authors: Songtao Lu, Ioannis Tsaknakis, Mingyi Hong, Yongxin Chen

Abstract: The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain sp… ▽ More The min-max problem, also known as the saddle point problem, is a class of optimization problems which minimizes and maximizes two subsets of variables simultaneously. This class of problems can be used to formulate a wide range of signal processing and communication (SPCOM) problems. Despite its popularity, most existing theory for this class has been mainly developed for problems with certain special convex-concave structure. Therefore, it cannot be used to guide the algorithm design for many interesting problems in SPCOM, where various kinds of non-convexity arise. In this work, we consider a block-wise one-sided non-convex min-max problem, in which the minimization problem consists of multiple blocks and is non-convex, while the maximization problem is (strongly) concave. We propose a class of simple algorithms named Hybrid Block Successive Approximation (HiBSA), which alternatingly perform gradient descent-type steps for the minimization blocks and gradient ascent-type steps for the maximization problem. A key element in the proposed algorithm is the use of certain regularization and penalty sequences, which stabilize the algorithm and ensure convergence. We show that HiBSA converges to some properly defined first-order stationary solutions with quantifiable global rates. To validate the efficiency of the proposed algorithms, we conduct numerical tests on a number of problems, including the robust learning problem, the non-convex min-utility maximization problems, and certain wireless jamming problem arising in interfering channels. △ Less

Submitted 16 March, 2021; v1 submitted 21 February, 2019; originally announced February 2019.

arXiv:1901.03674 [pdf, ps, other]

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator

Authors: Qi Cai, Mingyi Hong, Yongxin Chen, Zhaoran Wang

Abstract: We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization. To address the challenges arising from non-convex-concave geometry, we analyze the alternating gradient algorithm and establish its Q-linear rate of convergence to a unique saddle point, which simultaneously recovers the globally optimal policy and r… ▽ More We study the global convergence of generative adversarial imitation learning for linear quadratic regulators, which is posed as minimax optimization. To address the challenges arising from non-convex-concave geometry, we analyze the alternating gradient algorithm and establish its Q-linear rate of convergence to a unique saddle point, which simultaneously recovers the globally optimal policy and reward function. We hope our results may serve as a small step towards understanding and taming the instability in imitation learning as well as in more general non-convex-concave alternating minimax optimization that arises from reinforcement learning and generative adversarial learning. △ Less

Submitted 11 January, 2019; originally announced January 2019.

arXiv:1810.06907 [pdf]

doi 10.1109/TSG.2019.2891515

Coordinating Multiple Sources for Service Restoration to Enhance Resilience of Distribution Systems

Authors: Ying Wang, Yin Xu, **ghan He, Chen-Ching Liu, Kevin P. Schneider, Mingguo Hong, Dan T. Ton

Abstract: When a major outage occurs on a distribution system due to extreme events, microgrids, distributed generators, and other local resources can be used to restore critical loads and enhance resiliency. This paper proposes a decision-making method to determine the optimal restoration strategy coordinating multiple sources to serve critical loads after blackouts. The critical load restoration problem i… ▽ More When a major outage occurs on a distribution system due to extreme events, microgrids, distributed generators, and other local resources can be used to restore critical loads and enhance resiliency. This paper proposes a decision-making method to determine the optimal restoration strategy coordinating multiple sources to serve critical loads after blackouts. The critical load restoration problem is solved by a two-stage method with the first stage deciding the post-restoration topology and the second stage determining the set of loads to be restored and the outputs of sources. In the second stage, the problem is formulated as a mixed-integer semidefinite program. The objective is maximizing the number of loads restored, weighted by their priority. The unbalanced three-phase power flow constraint and operational constraints are considered. An iterative algorithm is proposed to deal with integer variables and can attain the global optimum of the critical load restoration problem by solving a few semidefinite programs under two conditions. The effectiveness of the proposed method is validated by numerical simulation with the modified IEEE 13-node test feeder and the modified IEEE 123-node test feeder under plenty of scenarios. The results indicate that the optimal restoration strategy can be determined efficiently in most scenarios. △ Less

Submitted 15 January, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: 13 pages, 7 figures, journal

arXiv:1810.05251 [pdf, other]

A Linearly Convergent Doubly Stochastic Gauss-Seidel Algorithm for Solving Linear Equations and A Certain Class of Over-Parameterized Optimization Problems

Authors: Meisam Razaviyayn, Mingyi Hong, Navid Reyhanian, Zhi-Quan Luo

Abstract: Consider the classical problem of solving a general linear system of equations $Ax=b$. It is well known that the (successively over relaxed) Gauss-Seidel scheme and many of its variants may not converge when $A$ is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G-S type algorithm that works for {\it any} $A$? In this paper we answer this question aff… ▽ More Consider the classical problem of solving a general linear system of equations $Ax=b$. It is well known that the (successively over relaxed) Gauss-Seidel scheme and many of its variants may not converge when $A$ is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G-S type algorithm that works for {\it any} $A$? In this paper we answer this question affirmatively by proposing a doubly stochastic G-S algorithm that is provably linearly convergent (in the mean square error sense) for any feasible linear system of equations. The key in the algorithm design is to introduce a {\it nonuniform double stochastic} scheme for picking the equation and the variable in each update step as well as a stepsize rule. These techniques also generalize to certain iterative alternating projection algorithms for solving the linear feasibility problem $A x\le b$ with an arbitrary $A$, as well as high-dimensional minimization problems for training over-parameterized models in machine learning. Our results demonstrate that a carefully designed randomization scheme can make an otherwise divergent G-S algorithm converge. △ Less

Submitted 13 May, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1808.02941 [pdf, other]

On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

Authors: Xiangyi Chen, Sijia Liu, Ruoyu Sun, Mingyi Hong

Abstract: This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconve… ▽ More This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconvex problems remains an open question. This paper provides a set of mild sufficient conditions that guarantee the convergence for the Adam-type methods. We prove that under our derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization. We show the conditions are essential in the sense that violating them may make the algorithm diverge. Moreover, we propose and analyze a class of (deterministic) incremental adaptive gradient algorithms, which has the same $O(\log{T}/\sqrt{T})$ convergence rate. Our study could also be extended to a broader class of adaptive gradient methods in machine learning and optimization. △ Less

Submitted 9 March, 2019; v1 submitted 8 August, 2018; originally announced August 2018.

arXiv:1806.10684 [pdf]

Price-Based Market Clearing with V2G Integration Using Generalized Benders Decomposition

Authors: Reza Jamalzadeh, Sajjad Abedi, Masoud Rashidinejad, Mingguo Hong

Abstract: Currently, most ISOs adopt offer cost minimization (OCM) auction mechanism which minimizes the total offer cost, and then, a settlement rule based on either locational marginal prices (LMPs) or market clearing price (MCP) is used to determine the payments to the committed units, which is not compatible with the auction mechanism because the minimized cost is different from the payment cost calcula… ▽ More Currently, most ISOs adopt offer cost minimization (OCM) auction mechanism which minimizes the total offer cost, and then, a settlement rule based on either locational marginal prices (LMPs) or market clearing price (MCP) is used to determine the payments to the committed units, which is not compatible with the auction mechanism because the minimized cost is different from the payment cost calculated by the settlement rule. This inconsistency can drastically increase the payment cost. On the other hand, payment cost minimization (PCM) auction mechanism eliminates this inconsistency; however, PCM problem is a nonlinear self-referring NP-hard problem which poses grand computational burden. In this paper, a mixed-integer nonlinear programing (MINLP) formulation of PCM problem are presented to address additional complexity of fast-growing penetration of Vehicle-to-Grid (V2G) in the price-based market clearing problem, and a solution method based on the generalized benders decomposition (GBD) is then proposed to solve the V2G-integrated PCM problem, and its favorable performance in terms of convergence and computational efficiency is demonstrated using case studies. The proposed GBD-based method can handle scaled-up models with the increased number of decision variables and constraints which facilitates the use of PCM mechanism in the market clearing of large-scale power systems. The impact of using V2G technologies on the OCM and PCM mechanisms in terms of MCPs and payments is also investigated, and by using numerical results, the performances of these two mechanisms are compared. △ Less

Submitted 27 June, 2018; originally announced June 2018.

arXiv:1806.00877 [pdf, ps, other]

Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Authors: Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong

Abstract: Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn… ▽ More Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems. △ Less

Submitted 8 January, 2019; v1 submitted 3 June, 2018; originally announced June 2018.

Comments: final version as appeared in NeurIPS 2018

arXiv:1804.08203 [pdf, ps, other]

doi 10.1137/18M1182887

Convergence of the Ginzburg-Landau approximation for the Ericksen-Leslie system

Authors: Zhewen Feng, Min-Chun Hong, Yu Mei

Abstract: We establish the local well-posedness of the general Ericksen-Leslie system in liquid crystals with the initial velocity and director field in $H^1 \times H_b^2$. In particular, we prove that the solutions of the Ginzburg-Landau approximation system converge smoothly to the solution of the Ericksen-Leslie system for any $t \in (0,T^\ast)$ with a maximal existence time $T^\ast$ of the Ericksen- Les… ▽ More We establish the local well-posedness of the general Ericksen-Leslie system in liquid crystals with the initial velocity and director field in $H^1 \times H_b^2$. In particular, we prove that the solutions of the Ginzburg-Landau approximation system converge smoothly to the solution of the Ericksen-Leslie system for any $t \in (0,T^\ast)$ with a maximal existence time $T^\ast$ of the Ericksen- Leslie system. △ Less

Submitted 13 June, 2019; v1 submitted 22 April, 2018; originally announced April 2018.

Showing 1–50 of 100 results for author: Hong, M