Search | arXiv e-print repository

Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data

Authors: Jiaojiao Zhang, Jiang Hu, Anthony Man-Cho So, Mikael Johansson

Abstract: Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated l… ▽ More Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2404.08073 [pdf, other]

Spurious Stationarity and Hardness Results for Mirror Descent

Authors: He Chen, Jia** Li, Anthony Man-Cho So

Abstract: Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence… ▽ More Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2401.12025 [pdf, other]

A Survey of Recent Advances in Optimization Methods for Wireless Communications

Authors: Ya-Feng Liu, Tsung-Hui Chang, Mingyi Hong, Zheyu Wu, Anthony Man-Cho So, Eduard A. Jorswieck, Wei Yu

Abstract: Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the n… ▽ More Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recently developed optimization techniques in areas ranging from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing or develo** suitable algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions. △ Less

Submitted 7 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 39 pages, 5 figures, accepted for publication in IEEE Journal on Selected Areas in Communications

arXiv:2305.15136 [pdf, other]

ReSync: Riemannian Subgradient-based Robust Rotation Synchronization

Authors: Huikang Liu, Xiao Li, Anthony Man-Cho So

Abstract: This work presents ReSync, a Riemannian subgradient-based algorithm for solving the robust rotation synchronization problem, which arises in various engineering applications. ReSync solves a least-unsquared minimization formulation over the rotation group, which is nonsmooth and nonconvex, and aims at recovering the underlying rotations directly. We provide strong theoretical guarantees for ReSync… ▽ More This work presents ReSync, a Riemannian subgradient-based algorithm for solving the robust rotation synchronization problem, which arises in various engineering applications. ReSync solves a least-unsquared minimization formulation over the rotation group, which is nonsmooth and nonconvex, and aims at recovering the underlying rotations directly. We provide strong theoretical guarantees for ReSync under the random corruption setting. Specifically, we first show that the initialization procedure of ReSync yields a proper initial point that lies in a local region around the ground-truth rotations. We next establish the weak sharpness property of the aforementioned formulation and then utilize this property to derive the local linear convergence of ReSync to the ground-truth rotations. By combining these guarantees, we conclude that ReSync converges linearly to the ground-truth rotations under appropriate conditions. Experiment results demonstrate the effectiveness of ReSync. △ Less

Submitted 5 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted for publication in NeurIPS 2023

arXiv:2305.14161 [pdf, ps, other]

Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

Authors: Xiao Li, Lei Zhao, Daoli Zhu, Anthony Man-Cho So

Abstract: The subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. The existing complexity and convergence results for this algorithm are mainly derived for Lipschitz continuous objective functions. In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex minimization without assuming Lipschitz continuity… ▽ More The subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. The existing complexity and convergence results for this algorithm are mainly derived for Lipschitz continuous objective functions. In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex minimization without assuming Lipschitz continuity. Specifically, we establish $\mathcal{O}(1/\sqrt{T})$ bound in terms of the suboptimality gap ``$f(x) - f^*$'' for convex case and $\mathcal{O}(1/{T}^{1/4})$ bound in terms of the gradient of the Moreau envelope function for weakly convex case. Furthermore, we provide convergence results for non-Lipschitz convex and weakly convex objective functions using proper diminishing rules on the step sizes. In particular, when $f$ is convex, we show $\mathcal{O}(\log(k)/\sqrt{k})$ rate of convergence in terms of the suboptimality gap. With an additional quadratic growth condition, the rate is improved to $\mathcal{O}(1/k)$ in terms of the squared distance to the optimal solution set. When $f$ is weakly convex, asymptotic convergence is derived. The central idea is that the dynamics of properly chosen step sizes rule fully controls the movement of the subgradient method, which leads to boundedness of the iterates, and then a trajectory-based analysis can be conducted to establish the desired results. To further illustrate the wide applicability of our framework, we extend the complexity results to the truncated subgradient, the stochastic subgradient, the incremental subgradient, and the proximal subgradient methods for non-Lipschitz functions. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.01379 [pdf, ps, other]

LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees

Authors: Shangyuan Liu, Linglingzhi Zhu, Anthony Man-Cho So

Abstract: Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and des… ▽ More Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and design a novel model (LogSpecT) and its practical formulation (rLogSpecT) to overcome this issue. Contrary to rSpecT, the novel practical model rLogSpecT is always feasible. Furthermore, we provide recovery guarantees of rLogSpecT, which are derived from modern optimization tools related to epi-convergence. These tools could be of independent interest and significant for various learning problems. To demonstrate the advantages of rLogSpecT in practice, a highly efficient algorithm based on the linearized alternating direction method of multipliers (L-ADMM) is proposed. The subproblems of L-ADMM admit closed-form solutions and the convergence is guaranteed. Extensive numerical results on both synthetic and real networks corroborate the stability and superiority of our proposed methods, underscoring their potential for various graph learning applications. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2304.04052 [pdf, other]

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

Authors: Zihao Fu, Wai Lam, Qian Yu, Anthony Man-Cho So, Shengding Hu, Zhiyuan Liu, Nigel Collier

Abstract: The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models direct… ▽ More The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task. Despite the significant advancements in applying language models to the seq2seq task, there is still a lack of thorough analysis on the effectiveness of the decoder-only language model architecture. This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure. This structure is designed to replicate all behaviors in the classical decoder-only language model but has an encoder and a decoder making it easier to be compared with the classical encoder-decoder structure. Based on the analysis, we unveil the attention degeneration problem in the language model, namely, as the generation step number grows, less and less attention is focused on the source sequence. To give a quantitative understanding of this problem, we conduct a theoretical sensitivity analysis of the attention output with respect to the source input. Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem. Experimental results on machine translation, summarization, and data-to-text generation tasks support our analysis and demonstrate the effectiveness of our proposed model. △ Less

Submitted 8 April, 2023; originally announced April 2023.

arXiv:2303.17779 [pdf, other]

Decentralized Weakly Convex Optimization Over the Stiefel Manifold

Authors: **xin Wang, Jiang Hu, Shixiang Chen, Zengde Deng, Anthony Man-Cho So

Abstract: We focus on a class of non-smooth optimization problems over the Stiefel manifold in the decentralized setting, where a connected network of $n$ agents cooperatively minimize a finite-sum objective function with each component being weakly convex in the ambient Euclidean space. Such optimization problems, albeit frequently encountered in applications, are quite challenging due to their non-smoothn… ▽ More We focus on a class of non-smooth optimization problems over the Stiefel manifold in the decentralized setting, where a connected network of $n$ agents cooperatively minimize a finite-sum objective function with each component being weakly convex in the ambient Euclidean space. Such optimization problems, albeit frequently encountered in applications, are quite challenging due to their non-smoothness and non-convexity. To tackle them, we propose an iterative method called the decentralized Riemannian subgradient method (DRSM). The global convergence and an iteration complexity of $\mathcal{O}(\varepsilon^{-2} \log^2(\varepsilon^{-1}))$ for forcing a natural stationarity measure below $\varepsilon$ are established via the powerful tool of proximal smoothness from variational analysis, which could be of independent interest. Besides, we show the local linear convergence of the DRSM using geometrically diminishing stepsizes when the problem at hand further possesses a sharpness property. Numerical experiments are conducted to corroborate our theoretical findings. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 27 pages, 6 figures, 1 table

arXiv:2303.06595 [pdf, other]

A Convergent Single-Loop Algorithm for Relaxation of Gromov-Wasserstein in Graph Data

Authors: Jia** Li, Jianheng Tang, Lemin Kong, Huikang Liu, Jia Li, Anthony Man-Cho So, Jose Blanchet

Abstract: In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW… ▽ More In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW problem satisfies the Luo-Tseng error bound condition, which relates to estimating the distance of a point to the critical point set of the GW problem based on the optimality residual. This observation allows us to provide an approximation bound for the distance between the fixed-point set of BAPG and the critical point set of GW. Moreover, under a mild technical assumption, we can show that BAPG converges to its fixed point set. The effectiveness of BAPG has been validated through comprehensive numerical experiments in graph alignment and partition tasks, where it outperforms existing methods in terms of both solution quality and wall-clock time. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted by ICLR 2023

arXiv:2302.12261 [pdf, ps, other]

Testing Stationarity Concepts for ReLU Networks: Hardness, Regularity, and Robust Algorithms

Authors: Lai Tian, Anthony Man-Cho So

Abstract: We study the computational problem of the stationarity test for the empirical loss of neural networks with ReLU activation functions. Our contributions are: Hardness: We show that checking a certain first-order approximate stationarity concept for a piecewise linear function is co-NP-hard. This implies that testing a certain stationarity concept for a modern nonsmooth neural network is in genera… ▽ More We study the computational problem of the stationarity test for the empirical loss of neural networks with ReLU activation functions. Our contributions are: Hardness: We show that checking a certain first-order approximate stationarity concept for a piecewise linear function is co-NP-hard. This implies that testing a certain stationarity concept for a modern nonsmooth neural network is in general computationally intractable. As a corollary, we prove that testing so-called first-order minimality for functions in abs-normal form is co-NP-complete, which was conjectured by Griewank and Walther (2019, SIAM J. Optim., vol. 29, p284). Regularity: We establish a necessary and sufficient condition for the validity of an equality-type subdifferential chain rule in terms of Clarke, Fréchet, and limiting subdifferentials of the empirical loss of two-layer ReLU networks. This new condition is simple and efficiently checkable. Robust algorithms: We introduce an algorithmic scheme to test near-approximate stationarity in terms of both Clarke and Fréchet subdifferentials. Our scheme makes no false positive or false negative error when the tested point is sufficiently close to a stationary one and a certain qualification is satisfied. This is the first practical and robust stationarity test approach for two-layer ReLU networks. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 42 pages

arXiv:2302.04610 [pdf, other]

Outlier-Robust Gromov-Wasserstein for Graph Data

Authors: Lemin Kong, Jia** Li, Jianheng Tang, Anthony Man-Cho So

Abstract: Gromov-Wasserstein (GW) distance is a powerful tool for comparing and aligning probability distributions supported on different metric spaces. Recently, GW has become the main modeling technique for aligning heterogeneous data for a wide range of graph learning tasks. However, the GW distance is known to be highly sensitive to outliers, which can result in large inaccuracies if the outliers are gi… ▽ More Gromov-Wasserstein (GW) distance is a powerful tool for comparing and aligning probability distributions supported on different metric spaces. Recently, GW has become the main modeling technique for aligning heterogeneous data for a wide range of graph learning tasks. However, the GW distance is known to be highly sensitive to outliers, which can result in large inaccuracies if the outliers are given the same weight as other samples in the objective function. To mitigate this issue, we introduce a new and robust version of the GW distance called RGW. RGW features optimistically perturbed marginal constraints within a Kullback-Leibler divergence-based ambiguity set. To make the benefits of RGW more accessible in practice, we develop a computationally efficient and theoretically provable procedure using Bregman proximal alternating linearized minimization algorithm. Through extensive experimentation, we validate our theoretical results and demonstrate the effectiveness of RGW on real-world graph learning tasks, such as subgraph matching and partial shape correspondence. △ Less

Submitted 30 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.09820 [pdf, other]

A Stability Analysis of Fine-Tuning a Pre-Trained Model

Authors: Zihao Fu, Anthony Man-Cho So, Nigel Collier

Abstract: Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods… ▽ More Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis. △ Less

Submitted 7 December, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.03971 [pdf, other]

Unsupervised Mandarin-Cantonese Machine Translation

Authors: Megan Dare, Valentina Fajardo Diaz, Averie Ho Zoen So, Yifan Wang, Shibingfeng Zhang

Abstract: Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the lan… ▽ More Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the language, due to the fact that Cantonese is primarily used for oral communication. The key contributions of our project include: 1. The creation of a new corpus containing approximately 1 million Cantonese sentences, and 2. A large-scale comparison across different model architectures, tokenization schemes, and embedding structures. Our best model trained with character-based tokenization and a Transformer architecture achieved a character-level BLEU of 25.1 when translating from Mandarin to Cantonese and of 24.4 when translating from Cantonese to Mandarin. In this paper we discuss our research process, experiments, and results. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2212.12978 [pdf, other]

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jia** Li

Abstract: Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verif… ▽ More Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-Łojasiewicz (PŁ) and Kurdyka-Łojasiewicz (KŁ) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided KŁ properties, achieving convergence with $\mathcal{O}(ε^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the KŁ exponent is known. Specifically, under the one-sided KŁ condition with exponent $θ\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(ε^{-2\max\{2θ,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the PŁ condition, KŁ condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems. △ Less

Submitted 30 October, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

arXiv:2211.15583 [pdf, other]

On the Effectiveness of Parameter-Efficient Fine-Tuning

Authors: Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier

Abstract: Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while kee** most of the parameters shared across different tasks. These methods ac… ▽ More Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while kee** most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.10825 [pdf, other]

Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis

Authors: Jia** Li, Linglingzhi Zhu, Anthony Man-Cho So

Abstract: Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle… ▽ More Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $θ\in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $ε$-game-stationary points and $ε$-optimization-stationary points of the problems of interest in $\mathcal{O}(ε^{-2\max\{2θ,1\}})$ iterations. Furthermore, when $θ\in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(ε^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $θ=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest. △ Less

Submitted 26 July, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2207.07287 [pdf, other]

Riemannian Natural Gradient Methods

Authors: Jiang Hu, Ruicheng Ao, Anthony Man-Cho So, Minghan Yang, Zaiwen Wen

Abstract: This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a na… ▽ More This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2205.08115 [pdf, other]

Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Data

Authors: Jia** Li, Jianheng Tang, Lemin Kong, Huikang Liu, Jia Li, Anthony Man-Cho So, Jose Blanchet

Abstract: In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarant… ▽ More In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition~\citep{luo1992error}, two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) enjoy the convergence guarantees. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best-fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results. △ Less

Submitted 14 December, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2202.12255 [pdf, ps, other]

Exact Community Recovery over Signed Graphs

Authors: Xiaolu Wang, Peng Wang, Anthony Man-Cho So

Abstract: Signed graphs encode similarity and dissimilarity relationships among different entities with positive and negative edges. In this paper, we study the problem of community recovery over signed graphs generated by the signed stochastic block model (SSBM) with two equal-sized communities. Our approach is based on the maximum likelihood estimation (MLE) of the SSBM. Unlike many existing approaches, o… ▽ More Signed graphs encode similarity and dissimilarity relationships among different entities with positive and negative edges. In this paper, we study the problem of community recovery over signed graphs generated by the signed stochastic block model (SSBM) with two equal-sized communities. Our approach is based on the maximum likelihood estimation (MLE) of the SSBM. Unlike many existing approaches, our formulation reveals that the positive and negative edges of a signed graph should be treated unequally. We then propose a simple two-stage iterative algorithm for solving the regularized MLE. It is shown that in the logarithmic degree regime, the proposed algorithm can exactly recover the underlying communities in nearly-linear time at the information-theoretic limit. Numerical results on both synthetic and real data are reported to validate and complement our theoretical developments and demonstrate the efficacy of the proposed method. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2201.07699 [pdf, ps, other]

doi 10.1109/TSP.2023.3240652

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part I

Authors: Jiaojiao Zhang, Huikang Liu, Anthony Man-Cho So, Qing Ling

Abstract: In this work, we investigate stochastic quasi-Newton methods for minimizing a finite sum of cost functions over a decentralized network. In Part I, we develop a general algorithmic framework that incorporates stochastic quasi-Newton approximations with variance reduction so as to achieve fast convergence. At each time each node constructs a local, inexact quasi-Newton direction that asymptotically… ▽ More In this work, we investigate stochastic quasi-Newton methods for minimizing a finite sum of cost functions over a decentralized network. In Part I, we develop a general algorithmic framework that incorporates stochastic quasi-Newton approximations with variance reduction so as to achieve fast convergence. At each time each node constructs a local, inexact quasi-Newton direction that asymptotically approaches the global, exact one. To be specific, (i) A local gradient approximation is constructed by using dynamic average consensus to track the average of variance-reduced local stochastic gradients over the entire network; (ii) A local Hessian inverse approximation is assumed to be positive definite with bounded eigenvalues, and how to construct it to satisfy these assumptions will be given in Part II. Compared to the existing decentralized stochastic first-order methods, the proposed general framework introduces the second-order curvature information without incurring extra sampling or communication. With a fixed step size, we establish the conditions under which the proposed general framework linearly converges to an exact optimal solution. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2112.14204 [pdf, other]

Non-Convex Joint Community Detection and Group Synchronization via Generalized Power Method

Authors: Si** Chen, Xiwei Cheng, Anthony Man-Cho So

Abstract: This paper proposes a Generalized Power Method (GPM) to tackle the problem of community detection and group synchronization simultaneously in a direct non-convex manner. Under the stochastic group block model (SGBM), theoretical analysis indicates that the algorithm is able to exactly recover the ground truth in $O(n\log^2n)$ time, sharply outperforming the benchmark method of semidefinite program… ▽ More This paper proposes a Generalized Power Method (GPM) to tackle the problem of community detection and group synchronization simultaneously in a direct non-convex manner. Under the stochastic group block model (SGBM), theoretical analysis indicates that the algorithm is able to exactly recover the ground truth in $O(n\log^2n)$ time, sharply outperforming the benchmark method of semidefinite programming (SDP) in $O(n^{3.5})$ time. Moreover, a lower bound of parameters is given as a necessary condition for exact recovery of GPM. The new bound breaches the information-theoretic threshold for pure community detection under the stochastic block model (SBM), thus demonstrating the superiority of our simultaneous optimization algorithm over the trivial two-stage method which performs the two tasks in succession. We also conduct numerical experiments on GPM and SDP to evidence and complement our theoretical analysis. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: 29 pages

arXiv:2109.15292 [pdf, other]

Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

Authors: Kaiwen Zhou, Anthony Man-Cho So, James Cheng

Abstract: We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives. We prove that our new accelerated method requires the same linear speed-up condition as the existing non-accelerated methods. Our core algorithmic discovery is a… ▽ More We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives. We prove that our new accelerated method requires the same linear speed-up condition as the existing non-accelerated methods. Our core algorithmic discovery is a new accelerated SVRG variant with sparse updates. Empirical results are presented to verify our theoretical findings. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: 21 pages, 22 figures

arXiv:2105.12062 [pdf, other]

Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

Authors: Kaiwen Zhou, Lai Tian, Anthony Man-Cho So, James Cheng

Abstract: In convex optimization, the problem of finding near-stationary points has not been adequately studied yet, unlike other optimality measures such as the function value. Even in the deterministic case, the optimal method (OGM-G, due to Kim and Fessler (2021)) has just been discovered recently. In this work, we conduct a systematic study of algorithmic techniques for finding near-stationary points of… ▽ More In convex optimization, the problem of finding near-stationary points has not been adequately studied yet, unlike other optimality measures such as the function value. Even in the deterministic case, the optimal method (OGM-G, due to Kim and Fessler (2021)) has just been discovered recently. In this work, we conduct a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums. Our main contributions are several algorithmic discoveries: (1) we discover a memory-saving variant of OGM-G based on the performance estimation problem approach (Drori and Teboulle, 2014); (2) we design a new accelerated SVRG variant that can simultaneously achieve fast rates for minimizing both the gradient norm and function value; (3) we propose an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities. We put an emphasis on the simplicity and practicality of the new schemes, which could facilitate future work. △ Less

Submitted 22 February, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: 29 pages, 4 figures

arXiv:2105.05458 [pdf, other]

doi 10.1109/TSP.2022.3229950

Distributionally Robust Graph Learning from Smooth Signals under Moment Uncertainty

Authors: Xiaolu Wang, Yuen-Man Pun, Anthony Man-Cho So

Abstract: We consider the problem of learning a graph from a finite set of noisy graph signal observations, the goal of which is to find a smooth representation of the graph signal. Such a problem is motivated by the desire to infer relational structure in large datasets and has been extensively studied in recent years. Most existing approaches focus on learning a graph on which the observed signals are smo… ▽ More We consider the problem of learning a graph from a finite set of noisy graph signal observations, the goal of which is to find a smooth representation of the graph signal. Such a problem is motivated by the desire to infer relational structure in large datasets and has been extensively studied in recent years. Most existing approaches focus on learning a graph on which the observed signals are smooth. However, the learned graph is prone to overfitting, as it does not take the unobserved signals into account. To address this issue, we propose a novel graph learning model based on the distributionally robust optimization methodology, which aims to identify a graph that not only provides a smooth representation of but is also robust against uncertainties in the observed signals. On the statistics side, we establish out-of-sample performance guarantees for our proposed model. On the optimization side, we show that under a mild assumption on the graph signal distribution, our proposed model admits a smooth non-convex optimization formulation. We then develop a projected gradient method to tackle this formulation and establish its convergence guarantees. Our formulation provides a new perspective on regularization in the graph learning setting. Moreover, extensive numerical experiments on both synthetic and real-world data show that our model has comparable yet more robust performance across different populations of observed signals than existing non-robust models according to various metrics. △ Less

Submitted 24 November, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

arXiv:2101.06907 [pdf, other]

Quartic Perturbation-based Outage-constrained Robust Design in Two-hop One-way Relay Networks

Authors: Sissi Xiaoxiao Wu, Sherry Xue-Ying Ni, Jiaying Li, Anthony Man-Cho So

Abstract: In this work, we study a classic robust design problem in two-hop one-way relay system. We are particularly interested in the scenario where channel uncertainty exists in both the transmitter-to-relay and relay-to-receiver links. By considering the problem design that minimizes the average amplify-and-forward power budget at the relay side while satisfying SNR outage requirements, an outage-constr… ▽ More In this work, we study a classic robust design problem in two-hop one-way relay system. We are particularly interested in the scenario where channel uncertainty exists in both the transmitter-to-relay and relay-to-receiver links. By considering the problem design that minimizes the average amplify-and-forward power budget at the relay side while satisfying SNR outage requirements, an outage-constrained robust design problem involving quartic perturbations is formulated to guarantee the robustness during transmission. This problem is in general difficult as it involves constraints on the tail probability of a high-order polynomial. Herein, we resort to moment inequality and Bernstein-type inequality to tackle this problem, which provide convex restrictions, or safe approximations, of the original design. We also analyze the relative tightness of the two safe approximations for a quadratic perturbation-based outage constrained problem. Our analysis shows that the Bernstein-type inequality approach is less conservative than the moment inequality approach when the outage rate is within some prescribed regime. To our best knowledge, this is the first provable tightness result for these two safe approximations. Our numerical simulations verify the superiority of the robust design and corroborate the tightness results. △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2012.14660 [pdf, other]

A Theoretical Analysis of the Repetition Problem in Text Generation

Authors: Zihao Fu, Wai Lam, Anthony Man-Cho So, Bei Shi

Abstract: Text generation tasks, including translation, summarization, language models, and etc. see rapid growth during recent years. Despite the remarkable achievements, the repetition problem has been observed in nearly all text generation models undermining the generation performance extensively. To solve the repetition problem, many methods have been proposed, but there is no existing theoretical analy… ▽ More Text generation tasks, including translation, summarization, language models, and etc. see rapid growth during recent years. Despite the remarkable achievements, the repetition problem has been observed in nearly all text generation models undermining the generation performance extensively. To solve the repetition problem, many methods have been proposed, but there is no existing theoretical analysis to show why this problem happens and how it is resolved. In this paper, we propose a new framework for theoretical analysis for the repetition problem. We first define the Average Repetition Probability (ARP) to characterize the repetition problem quantitatively. Then, we conduct an extensive analysis of the Markov generation model and derive several upper bounds of the average repetition probability with intuitive understanding. We show that most of the existing methods are essentially minimizing the upper bounds explicitly or implicitly. Grounded on our theory, we show that the repetition problem is, unfortunately, caused by the traits of our language itself. One major reason is attributed to the fact that there exist too many words predicting the same word as the subsequent word with high probability. Consequently, it is easy to go back to that word and form repetitions and we dub it as the high inflow problem. Furthermore, we derive a concentration bound of the average repetition probability for a general generation model. Finally, based on the theoretical upper bounds, we propose a novel rebalanced encoding approach to alleviate the high inflow problem. The experimental results show that our theoretical framework is applicable in general generation models and our proposed rebalanced encoding approach alleviates the repetition problem significantly. The source code of this paper can be obtained from https://github.com/fuzihaofzh/repetition-problem-nlg. △ Less

Submitted 21 March, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Comments: AAAI 21 Paper with Appendix

arXiv:2010.12865 [pdf, other]

Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

Authors: Jia** Li, Caihua Chen, Anthony Man-Cho So

Abstract: Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations of learning models admit tractable convex reformulations. How… ▽ More Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations of learning models admit tractable convex reformulations. However, most existing works propose to solve these convex reformulations by general-purpose solvers, which are not well-suited for tackling large-scale problems. In this paper, we focus on a family of Wasserstein distributionally robust support vector machine (DRSVM) problems and propose two novel epigraphical projection-based incremental algorithms to solve them. The updates in each iteration of these algorithms can be computed in a highly efficient manner. Moreover, we show that the DRSVM problems considered in this paper satisfy a Hölderian growth condition with explicitly determined growth exponents. Consequently, we are able to establish the convergence rates of the proposed incremental algorithms. Our numerical results indicate that the proposed methods are orders of magnitude faster than the state-of-the-art, and the performance gap grows considerably as the problem size increases. △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: Accepted by NeurIPS 2020

arXiv:2009.07514 [pdf, other]

A Unified Approach to Synchronization Problems over Subgroups of the Orthogonal Group

Authors: Huikang Liu, Man-Chung Yue, Anthony Man-Cho So

Abstract: The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consid… ▽ More The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consider the class of synchronization problems in which the group is a closed subgroup of the orthogonal group. This class covers many group synchronization problems that arise in practice. Our contribution is fivefold. First, we propose a unified approach for solving this class of group synchronization problems, which consists of a suitable initialization step and an iterative refinement step based on the generalized power method, and show that it enjoys a strong theoretical guarantee on the estimation error under certain assumptions on the group, measurement graph, noise, and initialization. Second, we formulate two geometric conditions that are required by our approach and show that they hold for various practically relevant subgroups of the orthogonal group. The conditions are closely related to the error-bound geometry of the subgroup -- an important notion in optimization. Third, we verify the assumptions on the measurement graph and noise for standard random graph and random matrix models. Fourth, based on the classic notion of metric entropy, we develop and analyze a novel spectral-type estimator. Finally, we show via extensive numerical experiments that our proposed non-convex approach outperforms existing approaches in terms of computational speed, scalability, and/or estimation error. △ Less

Submitted 16 June, 2023; v1 submitted 16 September, 2020; originally announced September 2020.

arXiv:2006.14901 [pdf, other]

Understanding Notions of Stationarity in Non-Smooth Optimization

Authors: Jia** Li, Anthony Man-Cho So, Wing-Kin Ma

Abstract: Many contemporary applications in signal processing and machine learning give rise to structured non-convex non-smooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon---and, in fact, one of the very difficult conundrums even for experts---lie in the study of "stationary points" of the problem in quest… ▽ More Many contemporary applications in signal processing and machine learning give rise to structured non-convex non-smooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon---and, in fact, one of the very difficult conundrums even for experts---lie in the study of "stationary points" of the problem in question. Unlike smooth optimization, for which the definition of a stationary point is rather standard, there is a myriad of definitions of stationarity in non-smooth optimization. In this article, we give an introduction to different stationarity concepts for several important classes of non-convex non-smooth functions and discuss the geometric interpretations and further clarify the relationship among these different concepts. We then demonstrate the relevance of these constructions in some representative applications and how they could affect the performance of iterative methods for tackling these applications. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: Accepted for publication in IEEE Signal Processing Magazine, 2020

arXiv:2005.12061 [pdf, other]

Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates

Authors: Kaiwen Zhou, Anthony Man-Cho So, James Cheng

Abstract: We propose a new methodology to design first-order methods for unconstrained strongly convex problems. Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition. We then propose an algor… ▽ More We propose a new methodology to design first-order methods for unconstrained strongly convex problems. Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition. We then propose an algorithmic template for tackling the shifted objective, which can exploit such a condition. Following this template, we derive several new accelerated schemes for problems that are equipped with various first-order oracles and show that the interpolation condition allows us to vastly simplify and tighten the analysis of the derived methods. In particular, all the derived methods have faster worst-case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods. △ Less

Submitted 21 October, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: NeurIPS 2020, 29 pages, 7 figures

arXiv:2005.02356 [pdf, other]

doi 10.1109/TSP.2021.3099643

Manifold Proximal Point Algorithms for Dual Principal Component Pursuit and Orthogonal Dictionary Learning

Authors: Shixiang Chen, Zengde Deng, Shiqian Ma, Anthony Man-Cho So

Abstract: We consider the problem of maximizing the $\ell_1$ norm of a linear map over the sphere, which arises in various machine learning applications such as orthogonal dictionary learning (ODL) and robust subspace recovery (RSR). The problem is numerically challenging due to its nonsmooth objective and nonconvex constraint, and its algorithmic aspects have not been well explored. In this paper, we show… ▽ More We consider the problem of maximizing the $\ell_1$ norm of a linear map over the sphere, which arises in various machine learning applications such as orthogonal dictionary learning (ODL) and robust subspace recovery (RSR). The problem is numerically challenging due to its nonsmooth objective and nonconvex constraint, and its algorithmic aspects have not been well explored. In this paper, we show how the manifold structure of the sphere can be exploited to design fast algorithms for tackling this problem. Specifically, our contribution is threefold. First, we present a manifold proximal point algorithm (ManPPA) for the problem and show that it converges at a sublinear rate. Furthermore, we show that ManPPA can achieve a quadratic convergence rate when applied to the ODL and RSR problems. Second, we propose a stochastic variant of ManPPA called StManPPA, which is well suited for large-scale computation, and establish its sublinear convergence rate. Both ManPPA and StManPPA have provably faster convergence rates than existing subgradient-type methods. Third, using ManPPA as a building block, we propose a new approach to solving a matrix analog of the problem, in which the sphere is replaced by the Stiefel manifold. The results from our extensive numerical experiments on the ODL and RSR problems demonstrate the efficiency and efficacy of our proposed methods. △ Less

Submitted 21 July, 2021; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: Accepted in IEEE Transactions on Signal Processing

arXiv:1911.05047 [pdf, other]

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods

Authors: Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So

Abstract: We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient metho… ▽ More We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient methods -- to solve these problems and show that they all have an iteration complexity of ${\cal O}(\varepsilon^{-4})$ for driving a natural stationarity measure below $\varepsilon$. In addition, we establish the local linear convergence of the Riemannian subgradient and incremental subgradient methods when the problem at hand further satisfies a sharpness property and the algorithms are properly initialized and use geometrically diminishing stepsizes. To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold. The fundamental ingredient in the proof of the aforementioned convergence results is a new Riemannian subgradient inequality for restrictions of weakly convex functions on the Stiefel manifold, which could be of independent interest. We also show that our convergence results can be extended to handle a class of compact embedded submanifolds of the Euclidean space. Finally, we discuss the sharpness properties of various formulations of the robust subspace recovery and orthogonal dictionary learning problems and demonstrate the convergence performance of the algorithms on both problems via numerical simulations. △ Less

Submitted 24 March, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 30 pages. Accepted to SIAM Journal on Optimization

MSC Class: 68Q25; 65K10; 90C90; 90C26; 90C06

arXiv:1910.12778 [pdf, other]

A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression

Authors: Jia** Li, Sen Huang, Anthony Man-Cho So

Abstract: Wasserstein distance-based distributionally robust optimization (DRO) has received much attention lately due to its ability to provide a robustness interpretation of various learning models. Moreover, many of the DRO problems that arise in the learning context admits exact convex reformulations and hence can be tackled by off-the-shelf solvers. Nevertheless, the use of such solvers severely limits… ▽ More Wasserstein distance-based distributionally robust optimization (DRO) has received much attention lately due to its ability to provide a robustness interpretation of various learning models. Moreover, many of the DRO problems that arise in the learning context admits exact convex reformulations and hence can be tackled by off-the-shelf solvers. Nevertheless, the use of such solvers severely limits the applicability of DRO in large-scale learning problems, as they often rely on general purpose interior-point algorithms. On the other hand, there are very few works that attempt to develop fast iterative methods to solve these DRO problems, which typically possess complicated structures. In this paper, we take a first step towards resolving the above difficulty by develo** a first-order algorithmic framework for tackling a class of Wasserstein distance-based distributionally robust logistic regression (DRLR) problem. Specifically, we propose a novel linearized proximal ADMM to solve the DRLR problem, whose objective is convex but consists of a smooth term plus two non-separable non-smooth terms. We prove that our method enjoys a sublinear convergence rate. Furthermore, we conduct three different experiments to show its superb performance on both synthetic and real-world datasets. In particular, our method can achieve the same accuracy up to 800+ times faster than the standard off-the-shelf solver. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1907.11687 [pdf, other]

Incremental Methods for Weakly Convex Optimization

Authors: Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Jason D Lee

Abstract: Incremental methods are widely utilized for solving finite-sum optimization problems in machine learning and signal processing. In this paper, we study a family of incremental methods -- including incremental subgradient, incremental proximal point, and incremental prox-linear methods -- for solving weakly convex optimization problems. Such a problem class covers many nonsmooth nonconvex instances… ▽ More Incremental methods are widely utilized for solving finite-sum optimization problems in machine learning and signal processing. In this paper, we study a family of incremental methods -- including incremental subgradient, incremental proximal point, and incremental prox-linear methods -- for solving weakly convex optimization problems. Such a problem class covers many nonsmooth nonconvex instances that arise in engineering fields. We show that the three said incremental methods have an iteration complexity of $O(\varepsilon^{-4})$ for driving a natural stationarity measure to below $\varepsilon$. Moreover, we show that if the weakly convex function satisfies a sharpness condition, then all three incremental methods, when properly initialized and equipped with geometrically diminishing stepsizes, can achieve a local linear rate of convergence. Our work is the first to extend the convergence rate analysis of incremental methods from the nonsmooth convex regime to the weakly convex regime. Lastly, we conduct numerical experiments on the robust matrix sensing problem to illustrate the convergence performance of the three incremental methods. △ Less

Submitted 23 December, 2022; v1 submitted 26 July, 2019; originally announced July 2019.

Comments: 26 pages

MSC Class: 68Q25; 65K10; 90C90; 90C26; 90C06

arXiv:1907.01385 [pdf, other]

Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT

Authors: Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui

Abstract: The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this e… ▽ More The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a distributed primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sublinear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the process of achieving global consensus on optimality. Lastly, we verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multi-agent IoT systems. △ Less

Submitted 29 August, 2020; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: Published at IEEE Internet of Things Journal

arXiv:1903.05006 [pdf, other]

An Efficient Augmented Lagrangian Based Method for Constrained Lasso

Authors: Zengde Deng, Anthony Man-Cho So

Abstract: Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, the constrained Lasso model has been proposed in the literature. In this paper, we present an inexact augmented Lagrangian method to solve the Lasso problem with linear equality constraints. By fully exploiting second-order sparsity of t… ▽ More Variable selection is one of the most important tasks in statistics and machine learning. To incorporate more prior information about the regression coefficients, the constrained Lasso model has been proposed in the literature. In this paper, we present an inexact augmented Lagrangian method to solve the Lasso problem with linear equality constraints. By fully exploiting second-order sparsity of the problem, we are able to greatly reduce the computational cost and obtain highly efficient implementations. Furthermore, numerical results on both synthetic data and real data show that our algorithm is superior to existing first-order methods in terms of both running time and solution accuracy. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1810.03159 [pdf, other]

doi 10.1109/TSP.2019.2937280

A Framework for One-Bit and Constant-Envelope Precoding over Multiuser Massive MISO Channels

Authors: Mingjie Shao, Qiang Li, Wing-Kin Ma, Anthony Man-Cho So

Abstract: Consider the following problem: A multi-antenna base station (BS) sends multiple symbol streams to multiple single-antenna users via precoding. However, unlike conventional multiuser precoding, the transmitted signals are subjected to binary, unit-modulus, or even discrete unit-modulus constraints. Such constraints arise in the one-bit and constant-envelope (CE) massive MIMO scenarios, wherein hig… ▽ More Consider the following problem: A multi-antenna base station (BS) sends multiple symbol streams to multiple single-antenna users via precoding. However, unlike conventional multiuser precoding, the transmitted signals are subjected to binary, unit-modulus, or even discrete unit-modulus constraints. Such constraints arise in the one-bit and constant-envelope (CE) massive MIMO scenarios, wherein high-resolution digital-to-analog converters (DACs) are replaced by one-bit DACs and phase shifters, respectively, for cutting down hardware cost and energy consumption. Multiuser precoding under one-bit and CE restrictions poses significant design difficulty. In this paper we establish a framework for designing multiuser precoding under the one-bit, continuous CE and discrete CE scenarios---all within one theme. We first formulate a precoding design that focuses on minimizations of symbol-error probabilities (SEPs), assuming quadrature amplitude modulation (QAM) constellations. We then devise an algorithm for our SEP-based design. The algorithm combines i) a novel penalty method for handling binary, unit-modulus and discrete unit-modulus constraints; and ii) a first-order non-convex optimization recipe custom-built for the design. Specifically, the latter is an inexact majorization-minimization method via accelerated projected gradient, which, as shown by simulations, runs very fast and can handle a large number of decision variables. Simulation results indicate that the proposed design offers significantly better bit-error rate performance than the existing designs. △ Less

Submitted 17 August, 2019; v1 submitted 7 October, 2018; originally announced October 2018.

Comments: To appear in IEEE Trans. Signal Processing

arXiv:1809.09237 [pdf, other]

Nonconvex Robust Low-rank Matrix Recovery

Authors: Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal

Abstract: In this paper we study the problem of recovering a low-rank matrix from a number of random linear measurements that are corrupted by outliers taking arbitrary values. We consider a nonsmooth nonconvex formulation of the problem, in which we explicitly enforce the low-rank property of the solution by using a factored representation of the matrix variable and employ an $\ell_1$-loss function to robu… ▽ More In this paper we study the problem of recovering a low-rank matrix from a number of random linear measurements that are corrupted by outliers taking arbitrary values. We consider a nonsmooth nonconvex formulation of the problem, in which we explicitly enforce the low-rank property of the solution by using a factored representation of the matrix variable and employ an $\ell_1$-loss function to robustify the solution against outliers. We show that even when a constant fraction (which can be up to almost half) of the measurements are arbitrarily corrupted, as long as certain measurement operators arising from the measurement model satisfy the so-called $\ell_1/\ell_2$-restricted isometry property, the ground-truth matrix can be exactly recovered from any global minimum of the resulting optimization problem. Furthermore, we show that the objective function of the optimization problem is sharp and weakly convex. Consequently, a subgradient Method (SubGM) with geometrically diminishing step sizes will converge linearly to the ground-truth matrix when suitably initialized. We demonstrate the efficacy of the SubGM for the nonconvex robust low-rank matrix recovery problem with various numerical experiments. △ Less

Submitted 14 July, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

arXiv:1806.03150 [pdf, other]

A User Evaluation of Automated Process Discovery Algorithms

Authors: Fabrizio Maria Maggi, Andrea Marrella, Fredrik Milani, Allar Soo, Silva Kasela

Abstract: Process mining methods allow analysts to use logs of historical executions of business processes in order to gain knowledge about the actual behavior of these processes. One of the most widely studied process mining operations is automated process discovery. An event log is taken as input by an automated process discovery method and produces a business process model as output that captures the con… ▽ More Process mining methods allow analysts to use logs of historical executions of business processes in order to gain knowledge about the actual behavior of these processes. One of the most widely studied process mining operations is automated process discovery. An event log is taken as input by an automated process discovery method and produces a business process model as output that captures the control-flow relations between tasks that are described by the event log. In this setting, this paper provides a systematic comparative evaluation of existing implementations of automated process discovery methods with domain experts by using a real-life event log extracted from an international software engineering company and four quality metrics. The evaluation results highlight gaps and unexplored trade-offs in the field and allow researchers to improve the lacks in the automated process discovery methods in terms of usability of process discovery techniques in industry. △ Less

Submitted 8 June, 2018; originally announced June 2018.

arXiv:1804.08305 [pdf, other]

Minimum Symbol Error Rate-Based Constant Envelope Precoding for Multiuser Massive MISO Downlink

Authors: Mingjie Shao, Qiang Li, Wing-Kin Ma, Anthony Man-Cho So

Abstract: This paper considers the problem of constant envelope (CE) precoder designs for multiuser massive MISO downlink channels. The use of CE signals allows one to employ low-cost radio frequency chains and thereby facilitates the implementation of massive MIMO. However, the subsequent CE precoder designs are usually challenging owing to the non-convex CE constraints. The existing CE precoder designs co… ▽ More This paper considers the problem of constant envelope (CE) precoder designs for multiuser massive MISO downlink channels. The use of CE signals allows one to employ low-cost radio frequency chains and thereby facilitates the implementation of massive MIMO. However, the subsequent CE precoder designs are usually challenging owing to the non-convex CE constraints. The existing CE precoder designs consider minimization of some measures on the distortion levels of the received symbols, and they usually aim at improving the symbol-error rate (SER) performances. In this paper we formulate a minimum SER-based design for CE precoding. The design formulation is non-convex and we propose two low-complexity first-order algorithms using gradient projection. Curiously, our simulation results show that the proposed designs can achieve bit-error rate performance close to that of zero-forcing beamforming without CE signaling restrictions. △ Less

Submitted 23 April, 2018; originally announced April 2018.

Comments: To appear in IEEE Statistical Signal Processing Workshop 2018

arXiv:1705.02288 [pdf, other]

Automated Discovery of Process Models from Event Logs: Review and Benchmark

Authors: Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, Allar Soo

Abstract: Process mining allows analysts to exploit logs of historical executions of business processes to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flo… ▽ More Process mining allows analysts to exploit logs of historical executions of business processes to extract insights regarding the actual performance of these processes. One of the most widely studied process mining operations is automated process discovery. An automated process discovery method takes as input an event log, and produces as output a business process model that captures the control-flow relations between tasks that are observed in or implied by the event log. Various automated process discovery methods have been proposed in the past two decades, striking different tradeoffs between scalability, accuracy and complexity of the resulting models. However, these methods have been evaluated in an ad-hoc manner, employing different datasets, experimental setups, evaluation measures and baselines, often leading to incomparable conclusions and sometimes unreproducible results due to the use of closed datasets. This article provides a systematic review and comparative evaluation of automated process discovery methods, using an open-source benchmark and covering twelve publicly-available real-life event logs, twelve proprietary real-life event logs, and nine quality metrics. The results highlight gaps and unexplored tradeoffs in the field, including the lack of scalability of some methods and a strong divergence in their performance with respect to the different quality metrics used. △ Less

Submitted 29 January, 2018; v1 submitted 5 May, 2017; originally announced May 2017.

arXiv:1603.05680 [pdf, other]

Semidefinite Relaxation and Approximation Analysis of a Beamformed Alamouti Scheme for Relay Beamforming Networks

Authors: Sissi Xiaoxiao Wu, Anthony Man-Cho So, Jiaxian Pan, Wing-Kin Ma

Abstract: In this paper, we study the amplify-and-forward (AF) schemes in two-hop one-way relay networks. In particular, we consider the multigroup multicast transmission between long-distance users. Given that perfect channel state information is perceived, our goal is to design the AF process so that the max-min-fair (MMF) signal-to-interference-plus-noise ratio (SINR) is optimized subject to generalized… ▽ More In this paper, we study the amplify-and-forward (AF) schemes in two-hop one-way relay networks. In particular, we consider the multigroup multicast transmission between long-distance users. Given that perfect channel state information is perceived, our goal is to design the AF process so that the max-min-fair (MMF) signal-to-interference-plus-noise ratio (SINR) is optimized subject to generalized power constraints. We propose a rank-two beamformed Alamouti (BFA) AF scheme and formulate the corresponding AF design problem as a \emph{two-variable} fractional quadratically-constrained quadratic program (QCQP), which is further tackled by the semidefinite relaxation (SDR) technique. We analyze the approximation quality of two-variable fractional SDRs under the Gaussian randomization algorithm. These results are fundamentally new and reveal that the proposed BFA AF scheme can outperform the traditional BF AF scheme, especially when there are many users in the system or many generalized power constraints in the problem formulation. From a practical perspective, the BFA AF scheme offers two degrees of freedom (DoFs) in beamformer design, as opposed to the one DoF offered by the BF AF scheme, to improve the receivers' SINR. In the latter part of this paper, we demonstrate how this extra DoF leads to provable performance gains by considering two special cases of multicasting, where the AF process is shown to employ a special structure. The numerical simulations further validate that the proposed BFA AF scheme outperforms the BF AF scheme and works well for large-scale relay systems. △ Less

Submitted 17 March, 2016; originally announced March 2016.

arXiv:1602.06500 [pdf, other]

Some Proof Derivations and Further Simulation Results for "Semidefinite Relaxation and Approximation Analysis of a Beamformed Alamouti Scheme for Relay Beamforming Networks"

Authors: Sissi Xiaoxiao Wu, Anthony Man-Cho So, Jiaxian Pan, Wing-Kin Ma

Abstract: This is a companion technical report of the main manuscript "Semidefinite Relaxation and Approximation Analysis of a Beamformed Alamouti Scheme for Relay Beamforming Networks". The report serves to give detailed derivations of Lemma 1-2 in the main manuscript, which are too long to be included in the latter. In addition, more simulation results are presented to verify the viability of the BF Alamo… ▽ More This is a companion technical report of the main manuscript "Semidefinite Relaxation and Approximation Analysis of a Beamformed Alamouti Scheme for Relay Beamforming Networks". The report serves to give detailed derivations of Lemma 1-2 in the main manuscript, which are too long to be included in the latter. In addition, more simulation results are presented to verify the viability of the BF Alamouti AF schemes developed in the main manuscript. △ Less

Submitted 16 March, 2016; v1 submitted 21 February, 2016; originally announced February 2016.

arXiv:1602.01569 [pdf, ps, other]

doi 10.1109/TSP.2017.2649488

Unraveling the Rank-One Solution Mystery of Robust MISO Downlink Transmit Optimization: A Verifiable Sufficient Condition via a New Duality Result

Authors: Wing-Kin Ma, Jiaxian Pan, Anthony Man-Cho So, Tsung-Hui Chang

Abstract: This paper concentrates on a robust transmit optimization problem for the multiuser multi-input single-output (MISO) downlink scenario and under inaccurate channel state information (CSI). This robust problem deals with a general-rank transmit covariance design, and it follows a safe rate-constrained formulation under spherically bounded CSI uncertainties. Curiously, simulation results in previous… ▽ More This paper concentrates on a robust transmit optimization problem for the multiuser multi-input single-output (MISO) downlink scenario and under inaccurate channel state information (CSI). This robust problem deals with a general-rank transmit covariance design, and it follows a safe rate-constrained formulation under spherically bounded CSI uncertainties. Curiously, simulation results in previous works suggested that the robust problem admits rank-one optimal transmit covariances in most cases. Such a numerical finding is appealing because transmission with rank-one covariances can be easily realized by single-stream transmit beamforming. This gives rise to a fundamentally important question, namely, whether we can theoretically identify conditions under which the robust problem admits a rank-one solution. In this paper, we identify one such condition. Simply speaking, we show that the robust problem is guaranteed to admit a rank-one solution if the CSI uncertainties are not too large and the multiuser channel is not too poorly conditioned. To establish the aforementioned condition, we develop a novel duality framework, through which an intimate relationship between the robust problem and a related maximin problem is revealed. Our condition involves only a simple expression with respect to the multiuser channel and other system parameters. In particular, unlike other sufficient rank-one conditions that have appeared in the literature, ours is verifiable. The application of our analysis framework to several other CSI uncertainty models is also discussed. △ Less

Submitted 30 December, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

Comments: To appear in IEEE Trans. Signal Process. This version combines the main manuscript and its accompanied supplementary report into one single article

arXiv:1512.03518 [pdf, ps, other]

A Unified Approach to Error Bounds for Structured Convex Optimization Problems

Authors: Zirui Zhou, Anthony Man-Cho So

Abstract: Error bounds, which refer to inequalities that bound the distance of vectors in a test set to a given set by a residual function, have proven to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems. In this paper, we present a new framework for establishing error bounds for a class of structured convex optimization problems, in whi… ▽ More Error bounds, which refer to inequalities that bound the distance of vectors in a test set to a given set by a residual function, have proven to be extremely useful in analyzing the convergence rates of a host of iterative methods for solving optimization problems. In this paper, we present a new framework for establishing error bounds for a class of structured convex optimization problems, in which the objective function is the sum of a smooth convex function and a general closed proper convex function. Such a class encapsulates not only fairly general constrained minimization problems but also various regularized loss minimization formulations in machine learning, signal processing, and statistics. Using our framework, we show that a number of existing error bound results can be recovered in a unified and transparent manner. To further demonstrate the power of our framework, we apply it to a class of nuclear-norm regularized loss minimization problems and establish a new error bound for this class under a strict complementarity-type regularity condition. We then complement this result by constructing an example to show that the said error bound could fail to hold without the regularity condition. Consequently, we obtain a rather complete answer to a question raised by Tseng. We believe that our approach will find further applications in the study of error bounds for structured convex optimization problems. △ Less

Submitted 10 December, 2015; originally announced December 2015.

Comments: 32 pages

arXiv:1511.04244 [pdf, other]

doi 10.1109/LSP.2016.2569600

A Robust Design for MISO Physical-Layer Multicasting over Line-of-Sight Channels

Authors: Man-Chung Yue, Sissi Xiaoxiao Wu, Anthony Man-Cho So

Abstract: This paper studies a robust design problem for far-field line-of-sight (LOS) channels where phase errors are present. Compared with the commonly used additive error model, the phase error model is more suitable for capturing the uncertainty in an LOS channel, as the dominant source of uncertainty lies in the phase. We consider a multiple-input single-output (MISO) multicast scenario, in which our… ▽ More This paper studies a robust design problem for far-field line-of-sight (LOS) channels where phase errors are present. Compared with the commonly used additive error model, the phase error model is more suitable for capturing the uncertainty in an LOS channel, as the dominant source of uncertainty lies in the phase. We consider a multiple-input single-output (MISO) multicast scenario, in which our goal is to design a beamformer that minimizes the transmit power while satisfying probabilistic signal-to-noise ratio (SNR) constraints. The probabilistic constraints give rise to a new computational challenge, as they involve random trigonometric forms. In this work, we propose to first approximate the random trigonometric form by its second-order Taylor expansion and then tackle the resulting random quadratic form using a Bernstein-type inequality. The advantage of such an approach is that an approximately optimal beamformer can be obtained using the standard semidefinite relaxation technique. In the simulations, we first show that if a non-robust design (i.e., one that does not take phase errors into account) is used, then the whole system may collapse. We then show that our proposed method is less conservative than the existing robust design based on Gaussian approximation and thus requires a lower power budget. △ Less

Submitted 13 November, 2015; originally announced November 2015.

Comments: This manuscript is submitted for possible journal publication on 13-Nov-2015

arXiv:1510.02448 [pdf, other]

A Stochastic Beamformed Amplify-and-Forward Scheme in a Multigroup Multicast MIMO Relay Network with Per-Antenna Power Constraints

Authors: Sissi Xiaoxiao Wu, Qiang Li, Anthony Man-Cho So, Wing-Kin Ma

Abstract: In this paper, we consider a two-hop one-way relay network for multigroup multicast transmission between long-distance users, in which the relay is equipped with multiple antennas, while the transmitters and receivers are all with a single antenna. Assuming that perfect channel state information is available, we study amplify-and-forward (AF) schemes that aim at optimizing the max-min-fair (MMF) r… ▽ More In this paper, we consider a two-hop one-way relay network for multigroup multicast transmission between long-distance users, in which the relay is equipped with multiple antennas, while the transmitters and receivers are all with a single antenna. Assuming that perfect channel state information is available, we study amplify-and-forward (AF) schemes that aim at optimizing the max-min-fair (MMF) rate. We begin by considering the classic beamformed AF (BF-AF) scheme, whose corresponding MMF design problem can be formulated as a rank-constrained fractional semidefinite program (SDP). We show that the gap between the BF-AF rate and the SDR rate associated with an optimal SDP solution is sensitive to the number of users as well as the number of power constraints in the relay system. This reveals that the BF-AF scheme may not be well suited for large-scale systems. We therefore propose the stochastic beamformed AF (SBF-AF) schemes, which differ from the BF-AF scheme in that time-varying AF weights are used. We prove that the MMF rates of the proposed SBF-AF schemes are at most $0.8317$ bits/s/Hz less than the SDR rate, irrespective of the number of users or power constraints. Thus, SBF-AF can outperform BF-AF especially in large-scale systems. Finally, we present numerical results to demonstrate the viability of our proposed schemes. △ Less

Submitted 16 March, 2016; v1 submitted 8 October, 2015; originally announced October 2015.

Comments: This paper was first submitted for possible journal publication on 24-May-2015. A preliminary version of this paper has been published in IEEE SPAWC'15

arXiv:1510.01025 [pdf, ps, other]

Quadratic Optimization with Orthogonality Constraints: Explicit Lojasiewicz Exponent and Linear Convergence of Line-Search Methods

Authors: Huikang Liu, Weijie Wu, Anthony Man-Cho So

Abstract: A fundamental class of matrix optimization problems that arise in many areas of science and engineering is that of quadratic optimization with orthogonality constraints. Such problems can be solved using line-search methods on the Stiefel manifold, which are known to converge globally under mild conditions. To determine the convergence rate of these methods, we give an explicit estimate of the exp… ▽ More A fundamental class of matrix optimization problems that arise in many areas of science and engineering is that of quadratic optimization with orthogonality constraints. Such problems can be solved using line-search methods on the Stiefel manifold, which are known to converge globally under mild conditions. To determine the convergence rate of these methods, we give an explicit estimate of the exponent in a Lojasiewicz inequality for the (non-convex) set of critical points of the aforementioned class of problems. By combining such an estimate with known arguments, we are able to establish the linear convergence of a large class of line-search methods. A key step in our proof is to establish a local error bound for the set of critical points, which may be of independent interest. △ Less

Submitted 5 October, 2015; originally announced October 2015.

MSC Class: 90C26; 90C46; 90C52

arXiv:1309.0113 [pdf, ps, other]

Non-Asymptotic Convergence Analysis of Inexact Gradient Methods for Machine Learning Without Strong Convexity

Authors: Anthony Man-Cho So

Abstract: Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact gradient computations and hence can be inefficient when the problem size is large or the gradient is difficult to evaluate. Therefore, there has been much interest… ▽ More Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact gradient computations and hence can be inefficient when the problem size is large or the gradient is difficult to evaluate. Therefore, there has been much interest in inexact gradient methods (IGMs), in which an efficiently computable approximate gradient is used to perform the update in each iteration. Currently, non-asymptotic linear convergence results for IGMs are typically established under the assumption that the objective function is strongly convex, which is not satisfied in many applications of interest; while linear convergence results that do not require the strong convexity assumption are usually asymptotic in nature. In this paper, we combine the best of these two types of results and establish---under the standard assumption that the gradient approximation errors decrease linearly to zero---the non-asymptotic linear convergence of IGMs when applied to a class of structured convex optimization problems. Such a class covers settings where the objective function is not necessarily strongly convex and includes the least squares and logistic regression problems. We believe that our techniques will find further applications in the non-asymptotic convergence analysis of other first-order methods. △ Less

Submitted 31 August, 2013; originally announced September 2013.

arXiv:1305.2123 [pdf, ps, other]

doi 10.1109/TSP.2013.2263500

Physical-Layer Multicasting by Stochastic Transmit Beamforming and Alamouti Space-Time Coding

Authors: Xiaoxiao Wu, Wing-Kin Ma, Anthony Man-Cho So

Abstract: Consider transceiver designs in a multiuser multi-input single-output (MISO) downlink channel, where the users are to receive the same data stream simultaneously. This problem, known as physical-layer multicasting, has drawn much interest. Presently, a popularized approach is transmit beamforming, in which the beamforming optimization is handled by a rank-one approximation method called semidefini… ▽ More Consider transceiver designs in a multiuser multi-input single-output (MISO) downlink channel, where the users are to receive the same data stream simultaneously. This problem, known as physical-layer multicasting, has drawn much interest. Presently, a popularized approach is transmit beamforming, in which the beamforming optimization is handled by a rank-one approximation method called semidefinite relaxation (SDR). SDR-based beamforming has been shown to be promising for a small or moderate number of users. This paper describes two new transceiver strategies for physical-layer multicasting. The first strategy, called stochastic beamforming (SBF), randomizes the beamformer in a per-symbol time-varying manner, so that the rank-one approximation in SDR can be bypassed. We propose several efficiently realizable SBF schemes, and prove that their multicast achievable rate gaps with respect to the MISO multicast capacity must be no worse than 0.8314 bits/s/Hz, irrespective of any other factors such as the number of users. The use of channel coding and the assumption of sufficiently long code lengths play a crucial role in achieving the above result. The second strategy combines transmit beamforming and the Alamouti space-time code. The result is a rank-two generalization of SDR-based beamforming. We show by analysis that this SDR-based beamformed Alamouti scheme has a better worst-case effective signal-to-noise ratio (SNR) scaling, and hence a better multicast rate scaling, than SDR-based beamforming. We further the work by combining SBF and the beamformed Alamouti scheme, wherein an improved constant rate gap of 0.39 bits/s/Hz is proven. Simulation results show that under a channel-coded, many-user setting, the proposed multicast transceiver schemes yield significant SNR gains over SDR-based beamforming at the same bit error rate level. △ Less

Submitted 16 July, 2013; v1 submitted 9 May, 2013; originally announced May 2013.

Comments: To appear in IEEE Transactions on Signal Processing

Showing 1–50 of 57 results for author: Soo, A