Search | arXiv e-print repository

NonlinearSolve.jl: High-Performance and Robust Solvers for Systems of Nonlinear Equations in Julia

Authors: Avik Pal, Flemming Holtorf, Axel Larsson, Torkel Loman, Utkarsh, Frank Schäefer, Qingyu Qu, Alan Edelman, Chris Rackauckas

Abstract: Efficiently solving nonlinear equations underpins numerous scientific and engineering disciplines, yet scaling these solutions for complex system models remains a challenge. This paper presents NonlinearSolve.jl - a suite of high-performance open-source nonlinear equation solvers implemented natively in the Julia programming language. NonlinearSolve.jl distinguishes itself by offering a unified AP… ▽ More Efficiently solving nonlinear equations underpins numerous scientific and engineering disciplines, yet scaling these solutions for complex system models remains a challenge. This paper presents NonlinearSolve.jl - a suite of high-performance open-source nonlinear equation solvers implemented natively in the Julia programming language. NonlinearSolve.jl distinguishes itself by offering a unified API that accommodates a diverse range of solver specifications alongside features such as automatic algorithm selection based on runtime analysis, support for GPU-accelerated computation through static array kernels, and the utilization of sparse automatic differentiation and Jacobian-free Krylov methods for large-scale problem-solving. Through rigorous comparison with established tools such as Sundials and MINPACK, NonlinearSolve.jl demonstrates unparalleled robustness and efficiency, achieving significant advancements in solving benchmark problems and challenging real-world applications. The capabilities of NonlinearSolve.jl unlock new potentials in modeling and simulation across various domains, making it a valuable addition to the computational toolkit of researchers and practitioners alike. △ Less

Submitted 28 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2311.15210 [pdf, other]

Topology combined machine learning for consonant recognition

Authors: **yao Feng, Siheng Yi, Qingrui Qu, Zhiwang Yu, Yifei Zhu

Abstract: In artificial-intelligence-aided signal processing, existing deep learning models often exhibit a black-box structure, and their validity and comprehensibility remain elusive. The integration of topological methods, despite its relatively nascent application, serves a dual purpose of making models more interpretable as well as extracting structural information from time-dependent data for smarter… ▽ More In artificial-intelligence-aided signal processing, existing deep learning models often exhibit a black-box structure, and their validity and comprehensibility remain elusive. The integration of topological methods, despite its relatively nascent application, serves a dual purpose of making models more interpretable as well as extracting structural information from time-dependent data for smarter learning. Here, we provide a transparent and broadly applicable methodology, TopCap, to capture the most salient topological features inherent in time series for machine learning. Rooted in high-dimensional ambient spaces, TopCap is capable of capturing features rarely detected in datasets with low intrinsic dimensionality. Applying time-delay embedding and persistent homology, we obtain descriptors which encapsulate information such as the vibration of a time series, in terms of its variability of frequency, amplitude, and average line, demonstrated with simulated data. This information is then vectorised and fed into multiple machine learning algorithms such as k-nearest neighbours and support vector machine. Notably, in classifying voiced and voiceless consonants, TopCap achieves an accuracy exceeding 96% and is geared towards designing topological convolutional layers for deep learning of speech and audio signals. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.02960 [pdf, other]

Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination

Authors: Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu

Abstract: Over the past decade, deep learning has proven to be a highly effective tool for learning meaningful features from raw data. However, it remains an open question how deep networks perform hierarchical feature learning across layers. In this work, we attempt to unveil this mystery by investigating the structures of intermediate features. Motivated by our empirical findings that linear layers mimic… ▽ More Over the past decade, deep learning has proven to be a highly effective tool for learning meaningful features from raw data. However, it remains an open question how deep networks perform hierarchical feature learning across layers. In this work, we attempt to unveil this mystery by investigating the structures of intermediate features. Motivated by our empirical findings that linear layers mimic the roles of deep layers in nonlinear networks for feature learning, we explore how deep linear networks transform input data into output by investigating the output (i.e., features) of each layer after training in the context of multi-class classification problems. Toward this goal, we first define metrics to measure within-class compression and between-class discrimination of intermediate features, respectively. Through theoretical analysis of these two metrics, we show that the evolution of features follows a simple and quantitative pattern from shallow to deep layers when the input data is nearly orthogonal and the network weights are minimum-norm, balanced, and approximate low-rank: Each layer of the linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate with respect to the number of layers that data have passed through. To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks. Empirically, our extensive experiments not only validate our theoretical results numerically but also reveal a similar pattern in deep nonlinear networks which aligns well with recent empirical studies. Moreover, we demonstrate the practical implications of our results in transfer learning. Our code is available at \url{https://github.com/Heimine/PNC_DLN}. △ Less

Submitted 9 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: 61 pages, 14 figures

arXiv:2210.02192 [pdf, other]

Are All Losses Created Equal: A Neural Collapse Perspective

Authors: **xin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

Abstract: While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so o… ▽ More While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on training data. Based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence. △ Less

Submitted 8 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 32 page, 10 figures, NeurIPS 2022

arXiv:2203.01238 [pdf, other]

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

Authors: **xin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu, Zhihui Zhu

Abstract: When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero… ▽ More When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functions. In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. Under a simplified unconstrained feature model, we provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Furthermore, we justify the usage of rescaled MSE loss by probing the optimization landscape around the NC solutions, showing that the landscape can be improved by tuning the rescaling hyperparameters. Finally, our theoretical findings are experimentally verified on practical network architectures. △ Less

Submitted 12 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.01896 [pdf, other]

Yordle: An Efficient Imitation Learning for Branch and Bound

Authors: Qingyu Qu, Xijun Li, Yunfan Zhou

Abstract: Combinatorial optimization problems have aroused extensive research interests due to its huge application potential. In practice, there are highly redundant patterns and characteristics during solving the combinatorial optimization problem, which can be captured by machine learning models. Thus, the 2021 NeurIPS Machine Learning for Combinatorial Optimization (ML4CO) competition is proposed with t… ▽ More Combinatorial optimization problems have aroused extensive research interests due to its huge application potential. In practice, there are highly redundant patterns and characteristics during solving the combinatorial optimization problem, which can be captured by machine learning models. Thus, the 2021 NeurIPS Machine Learning for Combinatorial Optimization (ML4CO) competition is proposed with the goal of improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning techniques. This work presents our solution and insights gained by team qqy in the dual task of the competition. Our solution is a highly efficient imitation learning framework for performance improvement of Branch and Bound (B&B), named Yordle. It employs a hybrid sampling method and an efficient data selection method, which not only accelerates the model training but also improves the decision quality during branching variable selection. In our experiments, Yordle greatly outperforms the baseline algorithm adopted by the competition while requiring significantly less time and amounts of data to train the decision model. Specifically, we use only 1/4 of the amount of data compared to that required for the baseline algorithm, to achieve around 50% higher score than baseline algorithm. The proposed framework Yordle won the championship of the student leaderboard. △ Less

Submitted 19 February, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2201.06213

arXiv:2201.06216 [pdf, other]

Learning to Reformulate for Linear Programming

Authors: Xijun Li, Qingyu Qu, Fangzhou Zhu, Jia Zeng, Mingxuan Yuan, Kun Mao, Jie Wang

Abstract: It has been verified that the linear programming (LP) is able to formulate many real-life optimization problems, which can obtain the optimum by resorting to corresponding solvers such as OptVerse, Gurobi and CPLEX. In the past decades, a serial of traditional operation research algorithms have been proposed to obtain the optimum of a given LP in a fewer solving time. Recently, there is a trend of… ▽ More It has been verified that the linear programming (LP) is able to formulate many real-life optimization problems, which can obtain the optimum by resorting to corresponding solvers such as OptVerse, Gurobi and CPLEX. In the past decades, a serial of traditional operation research algorithms have been proposed to obtain the optimum of a given LP in a fewer solving time. Recently, there is a trend of using machine learning (ML) techniques to improve the performance of above solvers. However, almost no previous work takes advantage of ML techniques to improve the performance of solver from the front end, i.e., the modeling (or formulation). In this paper, we are the first to propose a reinforcement learning-based reformulation method for LP to improve the performance of solving process. Using an open-source solver COIN-OR LP (CLP) as an environment, we implement the proposed method over two public research LP datasets and one large-scale LP dataset collected from practical production planning scenario. The evaluation results suggest that the proposed method can effectively reduce both the solving iteration number ($25\%\downarrow$) and the solving time ($15\%\downarrow$) over above datasets in average, compared to directly solving the original LP instances. △ Less

Submitted 16 January, 2022; originally announced January 2022.

arXiv:2201.06213 [pdf, other]

An Improved Reinforcement Learning Algorithm for Learning to Branch

Authors: Qingyu Qu, Xijun Li, Yunfan Zhou, Jia Zeng, Mingxuan Yuan, Jie Wang, **hu Lv, Kexin Liu, Kun Mao

Abstract: Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offl… ▽ More Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offline reinforcement learning, we initially train on the demonstration data to accelerate learning massively. With the improvement of the training effect, the agent starts to interact with the environment with its learned policy gradually. It is critical to improve the performance of the algorithm by determining the mixing ratio between demonstration and self-generated data. Thus, we propose a prioritized storage mechanism to control this ratio automatically. In order to improve the robustness of the training process, a superior network is additionally introduced based on Double DQN, which always serves as a Q-network with competitive performance. We evaluate the performance of the proposed algorithm over three public research benchmarks and compare it against strong baselines, including three classical heuristics and one state-of-the-art imitation learning-based branching algorithm. The results show that the proposed algorithm achieves the best performance among compared algorithms and possesses the potential to improve B\&B algorithm performance continuously. △ Less

Submitted 16 January, 2022; originally announced January 2022.

arXiv:2109.11154 [pdf, other]

Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

Authors: Lijun Ding, Liwei Jiang, Yudong Chen, Qing Qu, Zhihui Zhu

Abstract: We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associat… ▽ More We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associated nonconvex nonsmooth problem using a subgradient method with diminishing stepsizes. We show that under a regularity condition on the sensing matrices and corruption, which we call restricted direction preserving property (RDPP), even with rank overspecified, the subgradient method converges to the exact low-rank solution at a sublinear rate. Moreover, our result is more general in the sense that it automatically speeds up to a linear rate once the factor rank matches the unknown rank. On the other hand, we show that the RDPP condition holds under generic settings, such as Gaussian measurements under independent or adversarial sparse corruptions, where the result could be of independent interest. Both the exact recovery and the convergence rate of the proposed subgradient method are numerically verified in the overspecified regime. Moreover, our experiment further shows that our particular design of diminishing stepsize effectively prevents overfitting for robust recovery under overparameterized models, such as robust matrix sensing and learning robust deep image prior. This regularization effect is worth further investigation. △ Less

Submitted 26 October, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: 75 pages, 3 figures

arXiv:2105.02375 [pdf, other]

A Geometric Analysis of Neural Collapse with Unconstrained Features

Authors: Zhihui Zhu, Tianyu Ding, **xin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

Abstract: We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equi… ▽ More We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and ($ii$) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified $unconstrained\;feature\;model$, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests. For example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over $20\%$ on ResNet18 without sacrificing the generalization performance. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 42 pages, 8 figures, 1 table; the first two authors contributed to this work equally

arXiv:2007.06753 [pdf, other]

From Symmetry to Geometry: Tractable Nonconvex Problems

Authors: Yuqian Zhang, Qing Qu, John Wright

Abstract: As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the da… ▽ More As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single "ground truth" solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in sha** the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research. △ Less

Submitted 8 July, 2022; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: review paper, 38 pages, 10 figures, revision: correction of typos, adding more discussion on recent advances on deep learning

arXiv:2006.10557 [pdf, ps, other]

The navigation problems and the curvature properties on conic Kropina manifolds

Authors: Xinyue Cheng, Qiuhong Qu, Suiyun Xu

Abstract: In this paper, we study navigation problems on conic Kropina manifolds. Let $F(x, y)$ be a conic Kropina metric on an $n$-dimensional manifold $M$ and $V$ be a conformal vector field on $(M, F)$ with $F(x, - V_{x})\leq 1$. Let $\widetilde{F}= \widetilde{F} (x,y)$ be the solution of the navigation problem with navigation data $(F, V)$. We prove that $\widetilde{F}$ must be either a Randers metric o… ▽ More In this paper, we study navigation problems on conic Kropina manifolds. Let $F(x, y)$ be a conic Kropina metric on an $n$-dimensional manifold $M$ and $V$ be a conformal vector field on $(M, F)$ with $F(x, - V_{x})\leq 1$. Let $\widetilde{F}= \widetilde{F} (x,y)$ be the solution of the navigation problem with navigation data $(F, V)$. We prove that $\widetilde{F}$ must be either a Randers metric or a Kropina metric. Then we establish the relationships between some curvature properties of $F$ and the corresponding properties of the new metric $\widetilde{F}$, which involve S-curvature, flag curvature and Ricci curvature. △ Less

Submitted 18 June, 2020; originally announced June 2020.

MSC Class: 53B40; 53C60

arXiv:2006.08857 [pdf, other]

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Authors: Chong You, Zhihui Zhu, Qing Qu, Yi Ma

Abstract: Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and s… ▽ More Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2001.06970 [pdf, other]

Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications

Authors: Qing Qu, Zhihui Zhu, Xiao Li, Manolis C. Tsakiris, John Wright, René Vidal

Abstract: The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning. However, in contrast to the classical sparse recovery problem, the mos… ▽ More The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning. However, in contrast to the classical sparse recovery problem, the most natural formulation for finding the sparsest vector in a subspace is usually nonconvex. In this paper, we overview recent advances on global nonconvex optimization theory for solving this problem, ranging from geometric analysis of its optimization landscapes, to efficient optimization algorithms for solving the associated nonconvex optimization problem, to applications in machine intelligence, representation learning, and imaging sciences. Finally, we conclude this review by pointing out several interesting open problems for future research. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: QQ and ZZ contributed equally to the work. Invited review paper for IEEE Signal Processing Magazine Special Issue on non-convex optimization for signal processing and machine learning. This article contains 26 pages with 11 figures

arXiv:1912.02427 [pdf, other]

Analysis of the Optimization Landscapes for Overcomplete Representation Learning

Authors: Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu

Abstract: We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods wor… ▽ More We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods work so well are far from satisfactory. In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes. For both problems, we show the nonconvex objectives have benign (global) geometric structures, in the sense that every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature. This discovery enables the development of guaranteed global optimization methods using simple initializations. For both problems, we show the nonconvex objectives have benign geometric structures -- every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature -- either in the entire space or within a sufficiently large region. This discovery ensures local search algorithms (such as Riemannian gradient descent) with simple initializations approximately find the target solutions. Finally, numerical experiments justify our theoretical discoveries. △ Less

Submitted 10 December, 2019; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 68 pages, 5 figures

arXiv:1911.05047 [pdf, other]

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods

Authors: Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So

Abstract: We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient metho… ▽ More We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods -- namely Riemannain subgradient, incremental subgradient, and stochastic subgradient methods -- to solve these problems and show that they all have an iteration complexity of ${\cal O}(\varepsilon^{-4})$ for driving a natural stationarity measure below $\varepsilon$. In addition, we establish the local linear convergence of the Riemannian subgradient and incremental subgradient methods when the problem at hand further satisfies a sharpness property and the algorithms are properly initialized and use geometrically diminishing stepsizes. To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold. The fundamental ingredient in the proof of the aforementioned convergence results is a new Riemannian subgradient inequality for restrictions of weakly convex functions on the Stiefel manifold, which could be of independent interest. We also show that our convergence results can be extended to handle a class of compact embedded submanifolds of the Euclidean space. Finally, we discuss the sharpness properties of various formulations of the robust subspace recovery and orthogonal dictionary learning problems and demonstrate the convergence performance of the algorithms on both problems via numerical simulations. △ Less

Submitted 24 March, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 30 pages. Accepted to SIAM Journal on Optimization

MSC Class: 68Q25; 65K10; 90C90; 90C26; 90C06

arXiv:1908.10959 [pdf, other]

Short-and-Sparse Deconvolution -- A Geometric Approach

Authors: Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright

Abstract: Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution… ▽ More Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution problems involve smooth motifs (kernels) whose spectra decay rapidly, resulting in poor conditioning and numerical challenges. This paper is motivated by recent theoretical advances, which characterize the optimization landscape of a particular nonconvex formulation of SaSD. This is used to derive a $provable$ algorithm which exactly solves certain non-practical instances of the SaSD problem. We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a $practical$ algorithm, which performs well on data arising from a range of application areas. We highlight key additional challenges posed by the ill-conditioning of real SaSD problems, and suggest heuristics (acceleration, continuation, reweighting) to mitigate them. Experiments demonstrate both the performance and generality of the proposed method. △ Less

Submitted 1 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: *YL and QQ contributed equally to this work; 30 figures, 45 pages; This version: added an experiment comparing with other methods, corrected typos and added references

arXiv:1908.10776 [pdf, ps, other]

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution

Authors: Qing Qu, Xiao Li, Zhihui Zhu

Abstract: We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mathbf a \circledast \mathbf x_i $ ($i=1,\cdots,p$). We formulate the task as a nonconvex optimization problem over the sphere. Under mild statistical assumptions of… ▽ More We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mathbf a \circledast \mathbf x_i $ ($i=1,\cdots,p$). We formulate the task as a nonconvex optimization problem over the sphere. Under mild statistical assumptions of the data, we prove that the vanilla Riemannian gradient descent (RGD) method, with random initializations, provably recovers both the kernel $\mathbf a$ and the signals $\{\mathbf x_i\}_{i=1}^p$ up to a signed shift ambiguity. In comparison with state-of-the-art results, our work shows significant improvements in terms of sample complexity and computational efficiency. Our theoretical results are corroborated by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods on both synthetic and real datasets. △ Less

Submitted 29 February, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: 62 pages, 6 figures; short version accepted as a spotlight paper at NeurIPS'19 (https://papers.nips.cc/paper/8656-a-nonconvex-approach-for-exact-and-efficient-multichannel-sparse-blind-deconvolution) ; A long journal version is under revision at SIIMS

arXiv:1712.00716 [pdf, other]

Convolutional Phase Retrieval via Gradient Descent

Authors: Qing Qu, Yuqian Zhang, Yonina C. Eldar, John Wright

Abstract: We study the convolutional phase retrieval problem, of recovering an unknown signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb C^m $. This model is motivated by applications such as channel estimation, optics, and underwater acoustic communication, where the signal of interest is acted on by a… ▽ More We study the convolutional phase retrieval problem, of recovering an unknown signal $\mathbf x \in \mathbb C^n $ from $m$ measurements consisting of the magnitude of its cyclic convolution with a given kernel $\mathbf a \in \mathbb C^m $. This model is motivated by applications such as channel estimation, optics, and underwater acoustic communication, where the signal of interest is acted on by a given channel/filter, and phase information is difficult or impossible to acquire. We show that when $\mathbf a$ is random and the number of observations $m$ is sufficiently large, with high probability $\mathbf x$ can be efficiently recovered up to a global phase shift using a combination of spectral initialization and generalized gradient descent. The main challenge is co** with dependencies in the measurement operator. We overcome this challenge by using ideas from decoupling theory, suprema of chaos processes and the restricted isometry property of random circulant matrices, and recent analysis of alternating minimization methods. △ Less

Submitted 5 October, 2019; v1 submitted 3 December, 2017; originally announced December 2017.

Comments: 64 pages , 9 figures, appeared in NeurIPS 2017. Accepted at IEEE Transactions on Information Theory. This is the final (minor) update: fixed typos and grammar issues

arXiv:1602.06664 [pdf, other]

doi 10.1007/s10208-017-9365-9

A Geometric Analysis of Phase Retrieval

Authors: Ju Sun, Qing Qu, John Wright

Abstract: Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = |\mathbf a_k^* \mathbf x|$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent invest… ▽ More Can we recover a complex signal from its Fourier magnitudes? More generally, given a set of $m$ measurements, $y_k = |\mathbf a_k^* \mathbf x|$ for $k = 1, \dots, m$, is it possible to recover $\mathbf x \in \mathbb{C}^n$ (i.e., length-$n$ complex vector)? This **generalized phase retrieval** (GPR) problem is a fundamental task in various disciplines, and has been the subject of much recent investigation. Natural nonconvex heuristics often work remarkably well for GPR in practice, but lack clear theoretical explanations. In this paper, we take a step towards bridging this gap. We prove that when the measurement vectors $\mathbf a_k$'s are generic (i.i.d. complex Gaussian) and the number of measurements is large enough ($m \ge C n \log^3 n$), with high probability, a natural least-squares formulation for GPR has the following benign geometric structure: (1) there are no spurious local minimizers, and all global minimizers are equal to the target signal $\mathbf x$, up to a global phase; and (2) the objective function has a negative curvature around each saddle point. This structure allows a number of iterative optimization methods to efficiently find a global minimizer, without special initialization. To corroborate the claim, we describe and analyze a second-order trust-region algorithm. △ Less

Submitted 1 January, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

Comments: 61 pages, 5 figures. A short version can be found here http://sunju.org/docs/PR_G4_16.pdf . Revised according to reviewers' feedback

Journal ref: Foundations of Computational Mathematics, 18(5):1131--1198, 2018

arXiv:1511.04777 [pdf, other]

doi 10.1109/TIT.2016.2632149

Complete Dictionary Recovery over the Sphere II: Recovery by Riemannian Trust-region Method

Authors: Ju Sun, Qing Qu, John Wright

Abstract: We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and fin… ▽ More We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. In a companion paper (arXiv:1511.03607), we have showed that with high probability our nonconvex formulation has no "spurious" local minimizers and around any saddle point the objective function has a negative directional curvature. In this paper, we take advantage of the particular geometric structure, and describe a Riemannian trust region algorithm that provably converges to a local minimizer with from arbitrary initializations. Such minimizers give excellent approximations to rows of $\mathbf X_0$. The rows are then recovered by linear programming rounding and deflation. △ Less

Submitted 1 September, 2016; v1 submitted 15 November, 2015; originally announced November 2015.

Comments: The second of two papers based on the report arXiv:1504.06785. Accepted by IEEE Transaction on Information Theory; revised according to the reviewers' comments

Journal ref: IEEE Trans. Information Theory, 63(2): 885 - 914 (2017)

arXiv:1511.03607 [pdf, other]

doi 10.1109/TIT.2016.2632162

Complete Dictionary Recovery over the Sphere I: Overview and the Geometric Picture

Authors: Ju Sun, Qing Qu, John Wright

Abstract: We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and fin… ▽ More We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb{R}^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. In contrast, prior results based on efficient algorithms either only guarantee recovery when $\mathbf X_0$ has $O(\sqrt{n})$ zeros per column, or require multiple rounds of SDP relaxation to work when $\mathbf X_0$ has $O(n^{1-δ})$ nonzeros per column (for any constant $δ\in (0, 1)$). } Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint. In this paper, we provide a geometric characterization of the objective landscape. In particular, we show that the problem is highly structured: with high probability, (1) there are no "spurious" local minimizers; and (2) around all saddle points the objective has a negative directional curvature. This distinctive structure makes the problem amenable to efficient optimization algorithms. In a companion paper (arXiv:1511.04777), we design a second-order trust-region algorithm over the sphere that provably converges to a local minimizer from arbitrary initializations, despite the presence of saddle points. △ Less

Submitted 1 September, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

Comments: Accepted by IEEE Transaction on Information Theory; revised according to the reviewers' comments

Journal ref: IEEE Trans. Information Theory, 63(2): 853 - 884 (2017)

arXiv:1510.06096 [pdf, other]

When Are Nonconvex Problems Not Scary?

Authors: Ju Sun, Qing Qu, John Wright

Abstract: In this note, we focus on smooth nonconvex optimization problems that obey: (1) all local minimizers are also global; and (2) around any saddle point or local maximizer, the objective has a negative directional curvature. Concrete applications such as dictionary learning, generalized phase retrieval, and orthogonal tensor decomposition are known to induce such structures. We describe a second-orde… ▽ More In this note, we focus on smooth nonconvex optimization problems that obey: (1) all local minimizers are also global; and (2) around any saddle point or local maximizer, the objective has a negative directional curvature. Concrete applications such as dictionary learning, generalized phase retrieval, and orthogonal tensor decomposition are known to induce such structures. We describe a second-order trust-region algorithm that provably converges to a global minimizer efficiently, without special initializations. Finally we highlight alternatives, and open problems in this direction. △ Less

Submitted 22 April, 2016; v1 submitted 20 October, 2015; originally announced October 2015.

Comments: 6 pages, 3 figures. New examples on phase synchronization and community detection added; emphasis on all local minimizers being global added; exposition is polished. This is a concise expository article that avoids much technical rigor. We will make a separate submission with full technical details in future

arXiv:1505.03627 [pdf, ps, other]

Killing Vector Fields on Multiply Warped Products with a Semi-symmetric Metric Connection

Authors: Quan Qu

Abstract: In this paper, we define a semi-symmetric metric Killing vector field, then study semi-symmetric metric Killing vector fields on warped and multiply warped products with a semi-symmetric metric connection. We also study Killing and 2-Killing vector fields on multiply warped products. In this paper, we define a semi-symmetric metric Killing vector field, then study semi-symmetric metric Killing vector fields on warped and multiply warped products with a semi-symmetric metric connection. We also study Killing and 2-Killing vector fields on multiply warped products. △ Less

Submitted 14 May, 2015; originally announced May 2015.

Comments: 13 pages

MSC Class: 53B05; 53C21; 53C25; 53C50; 53C80

arXiv:1505.03319 [pdf, ps, other]

Quasi-Einstein and Generalized Quasi-Einstein Warped Products with an Affine Connection

Authors: Quan Qu

Abstract: In this paper, we study the quasi-Einstein and generalized quasi-Einstein warped products with a semi-symmetric non-metric connection. We give the expressions of the Ricci tensors and scalar curvatures for the bases and fibres. In some cases we give some obstructions to the existence of the quasi-Einstein and generalized quasi-Einstein warped products with a semi-symmetric non-metric connection. In this paper, we study the quasi-Einstein and generalized quasi-Einstein warped products with a semi-symmetric non-metric connection. We give the expressions of the Ricci tensors and scalar curvatures for the bases and fibres. In some cases we give some obstructions to the existence of the quasi-Einstein and generalized quasi-Einstein warped products with a semi-symmetric non-metric connection. △ Less

Submitted 13 May, 2015; originally announced May 2015.

Comments: 16 pages

MSC Class: 53B05; 53C25

arXiv:1504.06785 [pdf, other]

Complete Dictionary Recovery over the Sphere

Authors: Ju Sun, Qing Qu, John Wright

Abstract: We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb R^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and… ▽ More We consider the problem of recovering a complete (i.e., square and invertible) matrix $\mathbf A_0$, from $\mathbf Y \in \mathbb R^{n \times p}$ with $\mathbf Y = \mathbf A_0 \mathbf X_0$, provided $\mathbf X_0$ is sufficiently sparse. This recovery problem is central to the theoretical understanding of dictionary learning, which seeks a sparse representation for a collection of input signals, and finds numerous applications in modern signal processing and machine learning. We give the first efficient algorithm that provably recovers $\mathbf A_0$ when $\mathbf X_0$ has $O(n)$ nonzeros per column, under suitable probability model for $\mathbf X_0$. In contrast, prior results based on efficient algorithms provide recovery guarantees when $\mathbf X_0$ has only $O(n^{1-δ})$ nonzeros per column for any constant $δ\in (0, 1)$. Our algorithmic pipeline centers around solving a certain nonconvex optimization problem with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. To show this apparently hard problem is tractable, we first provide a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no "spurious" local minima. This particular geometric structure allows us to design a Riemannian trust region algorithm over the sphere that provably converges to one local minimizer with an arbitrary initialization, despite the presence of saddle points. The geometric approach we develop here may also shed light on other problems arising from nonconvex recovery of structured signals. △ Less

Submitted 17 November, 2015; v1 submitted 26 April, 2015; originally announced April 2015.

Comments: 104 pages, 5 figures. Due to length constraint of publication, this long paper are subsequently divided into two papers (arXiv:1511.03607 and arXiv:1511.04777). Further updates will be made only to the two papers

MSC Class: 68P30; 58C05; 94A12; 94A08; 68T05; 90C26; 90C48; 90C55

arXiv:1412.4659 [pdf, other]

doi 10.1109/TIT.2016.2601599

Finding a sparse vector in a subspace: Linear sparsity using alternating directions

Authors: Qing Qu, Ju Sun, John Wright

Abstract: Is it possible to find the sparsest vector (direction) in a generic subspace $\mathcal{S} \subseteq \mathbb{R}^p$ with $\mathrm{dim}(\mathcal{S})= n < p$? This problem can be considered a homogeneous variant of the sparse recovery problem, and finds connections to sparse dictionary learning, sparse PCA, and many other problems in signal processing and machine learning. In this paper, we focus on a… ▽ More Is it possible to find the sparsest vector (direction) in a generic subspace $\mathcal{S} \subseteq \mathbb{R}^p$ with $\mathrm{dim}(\mathcal{S})= n < p$? This problem can be considered a homogeneous variant of the sparse recovery problem, and finds connections to sparse dictionary learning, sparse PCA, and many other problems in signal processing and machine learning. In this paper, we focus on a **planted sparse model** for the subspace: the target sparse vector is embedded in an otherwise random subspace. Simple convex heuristics for this planted recovery problem provably break down when the fraction of nonzero entries in the target sparse vector substantially exceeds $O(1/\sqrt{n})$. In contrast, we exhibit a relatively simple nonconvex approach based on alternating directions, which provably succeeds even when the fraction of nonzero entries is $Ω(1)$. To the best of our knowledge, this is the first practical algorithm to achieve linear scaling under the planted sparse model. Empirically, our proposed algorithm also succeeds in more challenging data models, e.g., sparse dictionary learning. △ Less

Submitted 19 July, 2016; v1 submitted 15 December, 2014; originally announced December 2014.

Comments: Accepted by IEEE Trans. Information Theory. The paper has been revised by the reviewers' comments. The proofs have been streamlined

Journal ref: IEEE Transaction on Information Theory, 62(10):5855 - 5880, 2016

arXiv:1410.0170 [pdf, ps, other]

Multiply Warped Products with a Quarter-symmetric Connection

Authors: Quan Qu, Yong Wang

Abstract: In this paper, we study the Einstein warped products and multiply warped products with a quarter-symmetric connection. We also study warped products and multiply warped products with a quarter-symmetric connection with constant scalar curvature. Then apply our results to generalized Robertson-Walker spacetimes with a quarter-symmetric connection and generalized Kasner space-times with a quarter-sy… ▽ More In this paper, we study the Einstein warped products and multiply warped products with a quarter-symmetric connection. We also study warped products and multiply warped products with a quarter-symmetric connection with constant scalar curvature. Then apply our results to generalized Robertson-Walker spacetimes with a quarter-symmetric connection and generalized Kasner space-times with a quarter-symmetric connection. △ Less

Submitted 1 October, 2014; originally announced October 2014.

Comments: 41 pages. arXiv admin note: text overlap with arXiv:1207.5092

Showing 1–28 of 28 results for author: Qu, Q