Search | arXiv e-print repository

Optimization over bounded-rank matrices through a desingularization enables joint global and local guarantees

Authors: Quentin Rebjock, Nicolas Boumal

Abstract: Convergence guarantees for optimization over bounded-rank matrices are delicate to obtain because the feasible set is a non-smooth and non-convex algebraic variety. Existing techniques include projected gradient descent, fixed-rank optimization (over the maximal-rank stratum), and the LR parameterization. They all lack either global guarantees (the ability to accumulate only at critical points) or… ▽ More Convergence guarantees for optimization over bounded-rank matrices are delicate to obtain because the feasible set is a non-smooth and non-convex algebraic variety. Existing techniques include projected gradient descent, fixed-rank optimization (over the maximal-rank stratum), and the LR parameterization. They all lack either global guarantees (the ability to accumulate only at critical points) or fast local convergence (e.g., if the limit has non-maximal rank). We seek optimization algorithms that enjoy both. Khrulkov and Oseledets [2018] parameterize the bounded-rank variety via a desingularization to recast the optimization problem onto a smooth manifold. Building on their ideas, we develop a Riemannian geometry for this desingularization, also with care for numerical considerations. We use it to secure global convergence to critical points with fast local rates, for a large range of algorithms. On matrix completion tasks, we find that this approach is comparable to others, while enjoying better general-purpose theoretical guarantees. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2405.18273 [pdf, other]

Synchronization on circles and spheres with nonlinear interactions

Authors: Christopher Criscitiello, Quentin Rebjock, Andrew D. McRae, Nicolas Boumal

Abstract: We consider the dynamics of $n$ points on a sphere in $\mathbb{R}^d$ ($d \geq 2$) which attract each other according to a function $\varphi$ of their inner products. When $\varphi$ is linear ($\varphi(t) = t$), the points converge to a common value (i.e., synchronize) in various connectivity scenarios: this is part of classical work on Kuramoto oscillator networks. When $\varphi$ is exponential (… ▽ More We consider the dynamics of $n$ points on a sphere in $\mathbb{R}^d$ ($d \geq 2$) which attract each other according to a function $\varphi$ of their inner products. When $\varphi$ is linear ($\varphi(t) = t$), the points converge to a common value (i.e., synchronize) in various connectivity scenarios: this is part of classical work on Kuramoto oscillator networks. When $\varphi$ is exponential ($\varphi(t) = e^{βt}$), these dynamics correspond to a limit of how idealized transformers process data, as described by Geshkovski et al. (2024). Accordingly, they ask whether synchronization occurs for exponential $\varphi$. In the context of consensus for multi-agent control, Markdahl et al. (2018) show that for $d \geq 3$ (spheres), if the interaction graph is connected and $\varphi$ is increasing and convex, then the system synchronizes. What is the situation on circles ($d=2$)? First, we show that $\varphi$ being increasing and convex is no longer sufficient. Then we identify a new condition (that the Taylor coefficients of $\varphi'$ are decreasing) under which we do have synchronization on the circle. In so doing, we provide some answers to the open problems posed by Geshkovski et al. (2024). △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 28 pages, 1 figure

arXiv:2311.07404 [pdf, other]

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

Authors: Quentin Rebjock, Nicolas Boumal

Abstract: Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if the minima are not isolated, then the Hessian there cannot be positive definite. The weaker Polyak$\unicode{x2013}$Łojasiewicz (PŁ) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an $\textit{exact}$… ▽ More Trust-region methods (TR) can converge quadratically to minima where the Hessian is positive definite. However, if the minima are not isolated, then the Hessian there cannot be positive definite. The weaker Polyak$\unicode{x2013}$Łojasiewicz (PŁ) condition is compatible with non-isolated minima, and it is enough for many algorithms to preserve good local behavior. Yet, TR with an $\textit{exact}$ subproblem solver lacks even basic features such as a capture theorem under PŁ. In practice, a popular $\textit{inexact}$ subproblem solver is the truncated conjugate gradient method (tCG). Empirically, TR-tCG exhibits super-linear convergence under PŁ. We confirm this theoretically. The main mathematical obstacle is that, under PŁ, at points arbitrarily close to minima, the Hessian has vanishingly small, possibly negative eigenvalues. Thus, tCG is applied to ill-conditioned, indefinite systems. Yet, the core theory underlying tCG is that of CG, which assumes a positive definite operator. Accordingly, we develop new tools to analyze the dynamics of CG in the presence of small eigenvalues of any sign, for the regime of interest to TR-tCG. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2307.12743 [pdf, ps, other]

Open Problem: Polynomial linearly-convergent method for geodesically convex optimization?

Authors: Christopher Criscitiello, David Martínez-Rubio, Nicolas Boumal

Abstract: Let $f \colon \mathcal{M} \to \mathbb{R}$ be a Lipschitz and geodesically convex function defined on a $d$-dimensional Riemannian manifold $\mathcal{M}$. Does there exist a first-order deterministic algorithm which (a) uses at most $O(\mathrm{poly}(d) \log(ε^{-1}))$ subgradient queries to find a point with target accuracy $ε$, and (b) requires only $O(\mathrm{poly}(d))$ arithmetic operations per q… ▽ More Let $f \colon \mathcal{M} \to \mathbb{R}$ be a Lipschitz and geodesically convex function defined on a $d$-dimensional Riemannian manifold $\mathcal{M}$. Does there exist a first-order deterministic algorithm which (a) uses at most $O(\mathrm{poly}(d) \log(ε^{-1}))$ subgradient queries to find a point with target accuracy $ε$, and (b) requires only $O(\mathrm{poly}(d))$ arithmetic operations per query? In convex optimization, the classical ellipsoid method achieves this. After detailing related work, we provide an ellipsoid-like algorithm with query complexity $O(d^2 \log^2(ε^{-1}))$ and per-query complexity $O(d^2)$ for the limited case where $\mathcal{M}$ has constant curvature (hemisphere or hyperbolic space). We then detail possible approaches and corresponding obstacles for designing an ellipsoid-like method for general Riemannian manifolds. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Journal ref: Proceedings of Thirty Sixth Conference on Learning Theory (COLT 2023): https://proceedings.mlr.press/v195/criscitiello23b.html

arXiv:2307.02941 [pdf, other]

doi 10.1137/23M1584642

Benign landscapes of low-dimensional relaxations for orthogonal synchronization on general graphs

Authors: Andrew D. McRae, Nicolas Boumal

Abstract: Orthogonal group synchronization is the problem of estimating $n$ elements $Z_1, \ldots, Z_n$ from the $r \times r$ orthogonal group given some relative measurements $R_{ij} \approx Z_i^{}Z_j^{-1}$. The least-squares formulation is nonconvex. To avoid its local minima, a Shor-type convex relaxation squares the dimension of the optimization problem from $O(n)$ to $O(n^2)$. Alternatively, Burer--Mon… ▽ More Orthogonal group synchronization is the problem of estimating $n$ elements $Z_1, \ldots, Z_n$ from the $r \times r$ orthogonal group given some relative measurements $R_{ij} \approx Z_i^{}Z_j^{-1}$. The least-squares formulation is nonconvex. To avoid its local minima, a Shor-type convex relaxation squares the dimension of the optimization problem from $O(n)$ to $O(n^2)$. Alternatively, Burer--Monteiro-type nonconvex relaxations have generic landscape guarantees at dimension $O(n^{3/2})$. For smaller relaxations, the problem structure matters. It has been observed in the robotics literature that, for SLAM problems, it seems sufficient to increase the dimension by a small constant multiple over the original. We partially explain this. This also has implications for Kuramoto oscillators. Specifically, we minimize the least-squares cost function in terms of estimators $Y_1, \ldots, Y_n$. For $p \geq r$, each $Y_i$ is relaxed to the Stiefel manifold $\mathrm{St}(r, p)$ of $r \times p$ matrices with orthonormal rows. The available measurements implicitly define a (connected) graph $G$ on $n$ vertices. In the noiseless case, we show that, for all connected graphs $G$, second-order critical points are globally optimal as soon as $p \geq r+2$. (This implies that Kuramoto oscillators on $\mathrm{St}(r, p)$ synchronize for all $p \geq r + 2$.) This result is the best possible for general graphs; the previous best known result requires $2p \geq 3(r + 1)$. For $p > r + 2$, our result is robust to modest amounts of noise (depending on $p$ and $G$). Our proof uses a novel randomized choice of tangent direction to prove (near-)optimality of second-order critical points. Finally, we partially extend our noiseless landscape results to the complex case (unitary group); we show that there are no spurious local minima when $2p \geq 3r$. △ Less

Submitted 8 February, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

MSC Class: 90C26; 90C30; 90C35; 90C46

arXiv:2306.02959 [pdf, ps, other]

Curvature and complexity: Better lower bounds for geodesically convex optimization

Authors: Christopher Criscitiello, Nicolas Boumal

Abstract: We study the query complexity of geodesically convex (g-convex) optimization on a manifold. To isolate the effect of that manifold's curvature, we primarily focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly g-convex or not; high- or low-dimensional), known upper bounds worsen with curvature. It is natural to ask whether this is warranted, or an artifact. For many such… ▽ More We study the query complexity of geodesically convex (g-convex) optimization on a manifold. To isolate the effect of that manifold's curvature, we primarily focus on hyperbolic spaces. In a variety of settings (smooth or not; strongly g-convex or not; high- or low-dimensional), known upper bounds worsen with curvature. It is natural to ask whether this is warranted, or an artifact. For many such settings, we propose a first set of lower bounds which indeed confirm that (negative) curvature is detrimental to complexity. To do so, we build on recent lower bounds (Hamilton and Moitra, 2021; Criscitiello and Boumal, 2022) for the particular case of smooth, strongly g-convex optimization. Using a number of techniques, we also secure lower bounds which capture dependence on condition number and optimality gap, which was not previously the case. We suspect these bounds are not optimal. We conjecture optimal ones, and support them with a matching lower bound for a class of algorithms which includes subgradient descent, and a lower bound for a related game. Lastly, to pinpoint the difficulty of proving lower bounds, we study how negative curvature influences (and sometimes obstructs) interpolation with g-convex functions. △ Less

Submitted 24 July, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: v1 to v2: Renamed the method of Rusciano 2019 from "center-of-gravity method" to "centerpoint method"

Journal ref: Proceedings of Thirty Sixth Conference on Learning Theory (COLT 2023): https://proceedings.mlr.press/v195/criscitiello23a.html

arXiv:2303.00096 [pdf, other]

Fast convergence to non-isolated minima: four equivalent conditions for $\mathrm{C}^2$ functions

Authors: Quentin Rebjock, Nicolas Boumal

Abstract: Optimization algorithms can see their local convergence rates deteriorate when the Hessian at the optimum is singular. These singularities are inescapable when the optima are non-isolated. Yet, under the right circumstances, several algorithms preserve their favorable rates even when optima form a continuum (e.g., due to over-parameterization). This has been explained under various structural assu… ▽ More Optimization algorithms can see their local convergence rates deteriorate when the Hessian at the optimum is singular. These singularities are inescapable when the optima are non-isolated. Yet, under the right circumstances, several algorithms preserve their favorable rates even when optima form a continuum (e.g., due to over-parameterization). This has been explained under various structural assumptions, including the Polyak--Łojasiewicz inequality, Quadratic Growth and the Error Bound. We show that, for cost functions which are twice continuously differentiable ($\mathrm{C}^2$), those three (local) properties are equivalent. Moreover, we show they are equivalent to the Morse--Bott property, that is, local minima form differentiable submanifolds, and the Hessian of the cost function is positive definite along its normal directions. We leverage this insight to improve local convergence guarantees for safe-guarded Newton-type methods under any (hence all) of the above assumptions. First, for adaptive cubic regularization, we secure quadratic convergence even with inexact subproblem solvers. Second, for trust-region methods, we argue convergence can fail with an exact subproblem solver, then proceed to show linear convergence with an inexact one (Cauchy steps). △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2207.03512 [pdf, ps, other]

doi 10.1007/s10107-024-02058-3

The effect of smooth parametrizations on nonconvex optimization landscapes

Authors: Eitan Levin, Joe Kileel, Nicolas Boumal

Abstract: We develop new tools to study landscapes in nonconvex optimization. Given one optimization problem, we pair it with another by smoothly parametrizing the domain. This is either for practical purposes (e.g., to use smooth optimization algorithms with good guarantees) or for theoretical purposes (e.g., to reveal that the landscape satisfies a strict saddle property). In both cases, the central quest… ▽ More We develop new tools to study landscapes in nonconvex optimization. Given one optimization problem, we pair it with another by smoothly parametrizing the domain. This is either for practical purposes (e.g., to use smooth optimization algorithms with good guarantees) or for theoretical purposes (e.g., to reveal that the landscape satisfies a strict saddle property). In both cases, the central question is: how do the landscapes of the two problems relate? More precisely: how do desirable points such as local minima and critical points in one problem relate to those in the other problem? A key finding in this paper is that these relations are often determined by the parametrization itself, and are almost entirely independent of the cost function. Accordingly, we introduce a general framework to study parametrizations by their effect on landscapes. The framework enables us to obtain new guarantees for an array of problems, some of which were previously treated on a case-by-case basis in the literature. Applications include: optimizing low-rank matrices and tensors through factorizations; solving semidefinite programs via the Burer-Monteiro approach; training neural networks by optimizing their weights and biases; and quotienting out symmetries. △ Less

Submitted 4 March, 2024; v1 submitted 7 July, 2022; originally announced July 2022.

MSC Class: 53Z50; 49J53; 90C26; 90C46

Journal ref: Math. Program. (2024)

arXiv:2204.01448 [pdf, ps, other]

Computing second-order points under equality constraints: revisiting Fletcher's augmented Lagrangian

Authors: Florentin Goyens, Armin Eftekhari, Nicolas Boumal

Abstract: We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher's augmented Lagrangian, we propose an algorithm to minimize t… ▽ More We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher's augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches $\varepsilon$-approximate second-order critical points of the original optimization problem in at most $\mathcal{O}(\varepsilon^{-3})$ iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher's augmented Lagrangian, which may be of independent interest. △ Less

Submitted 16 January, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2111.13263 [pdf, other]

Negative curvature obstructs acceleration for strongly geodesically convex optimization, even with exact first-order oracles

Authors: Christopher Criscitiello, Nicolas Boumal

Abstract: Hamilton and Moitra (2021) showed that, in certain regimes, it is not possible to accelerate Riemannian gradient descent in the hyperbolic plane if we restrict ourselves to algorithms which make queries in a (large) bounded domain and which receive gradients and function values corrupted by a (small) amount of noise. We show that acceleration remains unachievable for any deterministic algorithm wh… ▽ More Hamilton and Moitra (2021) showed that, in certain regimes, it is not possible to accelerate Riemannian gradient descent in the hyperbolic plane if we restrict ourselves to algorithms which make queries in a (large) bounded domain and which receive gradients and function values corrupted by a (small) amount of noise. We show that acceleration remains unachievable for any deterministic algorithm which receives exact gradient and function-value information (unbounded queries, no noise). Our results hold for the classes of strongly and nonstrongly geodesically convex functions, and for a large class of Hadamard manifolds including hyperbolic spaces and the symmetric space $\mathrm{SL}(n) / \mathrm{SO}(n)$ of positive definite $n \times n$ matrices of determinant one. This cements a surprising gap between the complexity of convex optimization and geodesically convex optimization: for hyperbolic spaces, Riemannian gradient descent is optimal on the class of smooth and and strongly geodesically convex functions, in the regime where the condition number scales with the radius of the optimization domain. The key idea for proving the lower bound consists of perturbing the hard functions of Hamilton and Moitra (2021) with sums of bump functions chosen by a resisting oracle. △ Less

Submitted 8 June, 2023; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: v2 to v3: Updated and shortened to reflect COLT 2022 version. Results on nonstrongly g-convex case (former Sec. 5) and reduction to Euclidean convexity (former Sec. 6) are now in Sec. 3 and App. D of "Curvature and Complexity: Better lower bounds for geodesically convex optimization", COLT 2023 (arxiv.longhoe.net/abs/2306.02959). v3 to v4: Added word "strongly" to title to match COLT 2022 version; Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:496-542, 2022, https://proceedings.mlr.press/v178/criscitiello22a

arXiv:2107.03877 [pdf, ps, other]

doi 10.1007/s10107-022-01851-2

Finding stationary points on bounded-rank matrices: A geometric hurdle and a smooth remedy

Authors: Eitan Levin, Joe Kileel, Nicolas Boumal

Abstract: We consider the problem of provably finding a stationary point of a smooth function to be minimized on the variety of bounded-rank matrices. This turns out to be unexpectedly delicate. We trace the difficulty back to a geometric obstacle: On a nonsmooth set, there may be sequences of points along which standard measures of stationarity tend to zero, but whose limit points are not stationary. We na… ▽ More We consider the problem of provably finding a stationary point of a smooth function to be minimized on the variety of bounded-rank matrices. This turns out to be unexpectedly delicate. We trace the difficulty back to a geometric obstacle: On a nonsmooth set, there may be sequences of points along which standard measures of stationarity tend to zero, but whose limit points are not stationary. We name such events apocalypses, as they can cause optimization algorithms to converge to non-stationary points. We illustrate this explicitly for an existing optimization algorithm on bounded-rank matrices. To provably find stationary points, we modify a trust-region method on a standard smooth parameterization of the variety. The method relies on the known fact that second-order stationary points on the parameter space map to stationary points on the variety. Our geometric observations and proposed algorithm generalize beyond bounded-rank matrices. We give a geometric characterization of apocalypses on general constraint sets, which implies that Clarke-regular sets do not admit apocalypses. Such sets include smooth manifolds, manifolds with boundaries, and convex sets. Our trust-region method supports parameterization by any complete Riemannian manifold. △ Less

Submitted 7 July, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: The arXiv version contains a few details omitted in the published version Math. Program. (2022)

MSC Class: 65K10; 49J53; 90C26; 90C46

arXiv:2101.03500 [pdf, other]

Random Conical Tilt Reconstruction without Particle Picking in Cryo-electron Microscopy

Authors: Ti-Yen Lan, Nicolas Boumal, Amit Singer

Abstract: We propose a method to reconstruct the 3-D molecular structure from micrographs collected at just one sample tilt angle in the random conical tilt scheme in cryo-electron microscopy. Our method uses autocorrelation analysis on the micrographs to estimate features of the molecule which are invariant under certain nuisance parameters such as the positions of molecular projections in the micrographs.… ▽ More We propose a method to reconstruct the 3-D molecular structure from micrographs collected at just one sample tilt angle in the random conical tilt scheme in cryo-electron microscopy. Our method uses autocorrelation analysis on the micrographs to estimate features of the molecule which are invariant under certain nuisance parameters such as the positions of molecular projections in the micrographs. This enables us to reconstruct the molecular structure directly from micrographs, completely circumventing the need for particle picking. We demonstrate reconstructions with simulated data and investigate the effect of the missing-cone region. These results show promise to reduce the size limit for single particle reconstruction in cryo-electron microscopy. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: 22 pages, 8 figures

arXiv:2011.13395 [pdf, other]

Second-order optimization for tensors with fixed tensor-train rank

Authors: Michael Psenka, Nicolas Boumal

Abstract: There are several different notions of "low rank" for tensors, associated to different formats. Among them, the Tensor Train (TT) format is particularly well suited for tensors of high order, as it circumvents the curse of dimensionality: an appreciable property for certain high-dimensional applications. It is often convenient to model such applications as optimization over the set of tensors with… ▽ More There are several different notions of "low rank" for tensors, associated to different formats. Among them, the Tensor Train (TT) format is particularly well suited for tensors of high order, as it circumvents the curse of dimensionality: an appreciable property for certain high-dimensional applications. It is often convenient to model such applications as optimization over the set of tensors with fixed (and low) TT rank. That set is a smooth manifold. Exploiting this fact, others have shown that Riemannian optimization techniques can perform particularly well on tasks such as tensor completion and special large-scale linear systems from PDEs. So far, however, these optimization techniques have been limited to first-order methods, likely because of the technical hurdles in deriving exact expressions for the Riemannian Hessian. In this paper, we derive a formula and efficient algorithm to compute the Riemannian Hessian on this manifold. This allows us to implement second-order optimization algorithms (namely, the Riemannian trust-region method) and to analyze the conditioning of optimization problems over the fixed TT rank manifold. In settings of interest, we show improved optimization performance on tensor completion compared to first-order methods and alternating least squares (ALS). Our work could have applications in training of neural networks with tensor layers. Our code is freely available. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: To appear in the NeurIPS workshop, OPT2020

arXiv:2011.03358 [pdf, ps, other]

Generalization of Quasi-Newton Methods: Application to Robust Symmetric Multisecant Updates

Authors: Damien Scieur, Lewis Liu, Thomas Pumir, Nicolas Boumal

Abstract: Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi-Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using severa… ▽ More Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi-Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using several secant equations in a least-squares sense. Our approach generalizes and unifies the design of quasi-Newton updates and satisfies provable robustness guarantees. △ Less

Submitted 8 February, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: AISTATS 2021

arXiv:2008.02252 [pdf, ps, other]

An accelerated first-order method for non-convex optimization on manifolds

Authors: Christopher Criscitiello, Nicolas Boumal

Abstract: We describe the first gradient methods on Riemannian manifolds to achieve accelerated rates in the non-convex case. Under Lipschitz assumptions on the Riemannian gradient and Hessian of the cost function, these methods find approximate first-order critical points faster than regular gradient descent. A randomized version also finds approximate second-order critical points. Both the algorithms and… ▽ More We describe the first gradient methods on Riemannian manifolds to achieve accelerated rates in the non-convex case. Under Lipschitz assumptions on the Riemannian gradient and Hessian of the cost function, these methods find approximate first-order critical points faster than regular gradient descent. A randomized version also finds approximate second-order critical points. Both the algorithms and their analyses build extensively on existing work in the Euclidean case. The basic operation consists in running the Euclidean accelerated gradient descent method (appropriately safe-guarded against non-convexity) in the current tangent space, then moving back to the manifold and repeating. This requires lifting the cost function from the manifold to the tangent space, which can be done for example through the Riemannian exponential map. For this approach to succeed, the lifted cost function (called the pullback) must retain certain Lipschitz properties. As a contribution of independent interest, we prove precise claims to that effect, with explicit constants. Those claims are affected by the Riemannian curvature of the manifold, which in turn affects the worst-case complexity bounds for our optimization algorithms. △ Less

Submitted 25 November, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: 77 pages. Updated for clarity and included additional remarks/secondary theorems

arXiv:1907.01145 [pdf, other]

doi 10.1093/imaiai/iaaa035

The generalized orthogonal Procrustes problem in the high noise regime

Authors: Thomas Pumir, Amit Singer, Nicolas Boumal

Abstract: We consider the problem of estimating a cloud of points from numerous noisy observations of that cloud after unknown rotations, and possibly reflections. This is an instance of the general problem of estimation under group action, originally inspired by applications in 3-D imaging and computer vision. We focus on a regime where the noise level is larger than the magnitude of the signal, so much so… ▽ More We consider the problem of estimating a cloud of points from numerous noisy observations of that cloud after unknown rotations, and possibly reflections. This is an instance of the general problem of estimation under group action, originally inspired by applications in 3-D imaging and computer vision. We focus on a regime where the noise level is larger than the magnitude of the signal, so much so that the rotations cannot be estimated reliably. We propose a simple and efficient procedure based on invariant polynomials (effectively: the Gram matrices) to recover the signal, and we assess it against fundamental limits of the problem that we derive. We show our approach adapts to the noise level and is statistically optimal (up to constants) for both the low and high noise regimes. In studying the variance of our estimator, we encounter the question of the sensivity of a type of thin Cholesky factorization, for which we provide an improved bound which may be of independent interest. △ Less

Submitted 23 May, 2021; v1 submitted 1 July, 2019; originally announced July 2019.

MSC Class: 34K30; 35K57; 35Q80; 92D25

Journal ref: Information and Inference: A Journal of the IMA, iaaa035, 2021

arXiv:1906.04321 [pdf, ps, other]

Efficiently esca** saddle points on manifolds

Authors: Chris Criscitiello, Nicolas Boumal

Abstract: Smooth, non-convex optimization problems on Riemannian manifolds occur in machine learning as a result of orthonormality, rank or positivity constraints. First- and second-order necessary optimality conditions state that the Riemannian gradient must be zero, and the Riemannian Hessian must be positive semidefinite. Generalizing ** et al.'s recent work on perturbed gradient descent (PGD) for optim… ▽ More Smooth, non-convex optimization problems on Riemannian manifolds occur in machine learning as a result of orthonormality, rank or positivity constraints. First- and second-order necessary optimality conditions state that the Riemannian gradient must be zero, and the Riemannian Hessian must be positive semidefinite. Generalizing ** et al.'s recent work on perturbed gradient descent (PGD) for optimization on linear spaces [How to Escape Saddle Points Efficiently (2017), Stochastic Gradient Descent Escapes Saddle Points Efficiently (2019)], we propose a version of perturbed Riemannian gradient descent (PRGD) to show that necessary optimality conditions can be met approximately with high probability, without evaluating the Hessian. Specifically, for an arbitrary Riemannian manifold $\mathcal{M}$ of dimension $d$, a sufficiently smooth (possibly non-convex) objective function $f$, and under weak conditions on the retraction chosen to move on the manifold, with high probability, our version of PRGD produces a point with gradient smaller than $ε$ and Hessian within $\sqrtε$ of being positive semidefinite in $O((\log{d})^4 / ε^{2})$ gradient queries. This matches the complexity of PGD in the Euclidean case. Crucially, the dependence on dimension is low. This matters for large-scale applications including PCA and low-rank matrix completion, which both admit natural formulations on manifolds. The key technical idea is to generalize PRGD with a distinction between two types of gradient steps: "steps on the manifold" and "perturbed steps in a tangent space of the manifold." Ultimately, this distinction makes it possible to extend ** et al.'s analysis seamlessly. △ Less

Submitted 22 October, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: 18 pages, NeurIPS 2019

arXiv:1905.03176 [pdf, other]

doi 10.1109/TSP.2020.2975943

Multi-target Detection with an Arbitrary Spacing Distribution

Authors: Ti-Yen Lan, Tamir Bendory, Nicolas Boumal, Amit Singer

Abstract: Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the pr… ▽ More Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the presence of high noise, which is the focus of this paper, detection is impossible. Here, we propose two approaches---autocorrelation analysis and an approximate expectation maximization algorithm---to reconstruct the signal without the need to detect signal occurrences in the measurement. In particular, our methods apply to an arbitrary spacing distribution of signal occurrences. We demonstrate reconstructions with synthetic data and empirically show that the sample complexity of both methods scales as 1/SNR^3 in the low SNR regime. △ Less

Submitted 22 January, 2020; v1 submitted 8 May, 2019; originally announced May 2019.

Comments: 13 pages, 8 figures

arXiv:1903.06022 [pdf, other]

doi 10.1088/1361-6420/ab2aec

Multi-target detection with application to cryo-electron microscopy

Authors: Tamir Bendory, Nicolas Boumal, William Leeb, Eitan Levin, Amit Singer

Abstract: We consider the multi-target detection problem of recovering a set of signals that appear multiple times at unknown locations in a noisy measurement. In the low noise regime, one can estimate the signals by first detecting occurrences, then clustering and averaging them. In the high noise regime however, neither detection nor clustering can be performed reliably, so that strategies along these lin… ▽ More We consider the multi-target detection problem of recovering a set of signals that appear multiple times at unknown locations in a noisy measurement. In the low noise regime, one can estimate the signals by first detecting occurrences, then clustering and averaging them. In the high noise regime however, neither detection nor clustering can be performed reliably, so that strategies along these lines are destined to fail. Notwithstanding, using autocorrelation analysis, we show that the impossibility to detect and cluster signal occurrences in the presence of high noise does not necessarily preclude signal estimation. Specifically, to estimate the signals, we derive simple relations between the autocorrelations of the observation and those of the signals. These autocorrelations can be estimated accurately at any noise level given a sufficiently long measurement. To recover the signals from the observed autocorrelations, we solve a set of polynomial equations through nonlinear least-squares. We provide analysis regarding well-posedness of the task, and demonstrate numerically the effectiveness of the method in a variety of settings. The main goal of this work is to provide theoretical and numerical support for a recently proposed framework to image 3-D structures of biological macromolecules using cryo-electron microscopy in extreme noise levels. △ Less

Submitted 3 June, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

Comments: arXiv admin note: text overlap with arXiv:1810.00226

arXiv:1901.10000 [pdf, other]

Simple algorithms for optimization on Riemannian manifolds with constraints

Authors: Changshuo Liu, Nicolas Boumal

Abstract: We consider optimization problems on manifolds with equality and inequality constraints. A large body of work treats constrained optimization in Euclidean spaces. In this work, we consider extensions of existing algorithms from the Euclidean case to the Riemannian case. Thus, the variable lives on a known smooth manifold and is further constrained. In doing so, we exploit the growing literature on… ▽ More We consider optimization problems on manifolds with equality and inequality constraints. A large body of work treats constrained optimization in Euclidean spaces. In this work, we consider extensions of existing algorithms from the Euclidean case to the Riemannian case. Thus, the variable lives on a known smooth manifold and is further constrained. In doing so, we exploit the growing literature on unconstrained Riemannian optimization. For the special case where the manifold is itself described by equality constraints, one could in principle treat the whole problem as a constrained problem in a Euclidean space. The main hypothesis we test here is whether it is sometimes better to exploit the geometry of the constraints, even if only for a subset of them. Specifically, this paper extends an augmented Lagrangian method and smoothed versions of an exact penalty method to the Riemannian case, together with some fundamental convergence results. Numerical experiments indicate some gains in computational efficiency and accuracy in some regimes for minimum balanced cut, non-negative PCA and $k$-means, especially in high dimensions. △ Less

Submitted 24 April, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

Comments: 32 pages, 2 figures

MSC Class: 65K05; 90C30 (primary) 53A99 (secondary)

arXiv:1811.10382 [pdf, other]

Heterogeneous multireference alignment for images with application to 2-D classification in single particle reconstruction

Authors: Chao Ma, Tamir Bendory, Nicolas Boumal, Fred Sigworth, Amit Singer

Abstract: Motivated by the task of 2-D classification in single particle reconstruction by cryo-electron microscopy (cryo-EM), we consider the problem of heterogeneous multireference alignment of images. In this problem, the goal is to estimate a (typically small) set of target images from a (typically large) collection of observations. Each observation is a rotated, noisy version of one of the target image… ▽ More Motivated by the task of 2-D classification in single particle reconstruction by cryo-electron microscopy (cryo-EM), we consider the problem of heterogeneous multireference alignment of images. In this problem, the goal is to estimate a (typically small) set of target images from a (typically large) collection of observations. Each observation is a rotated, noisy version of one of the target images. For each individual observation, neither the rotation nor which target image has been rotated are known. As the noise level in cryo-EM data is high, clustering the observations and estimating individual rotations is challenging. We propose a framework to estimate the target images directly from the observations, completely bypassing the need to cluster or register the images. The framework consists of two steps. First, we estimate rotation-invariant features of the images, such as the bispectrum. These features can be estimated to any desired accuracy, at any noise level, provided sufficiently many observations are collected. Then, we estimate the images from the invariant features. Numerical experiments on synthetic cryo-EM datasets demonstrate the effectiveness of the method. Ultimately, we outline future developments required to apply this method to experimental data. △ Less

Submitted 1 October, 2019; v1 submitted 11 October, 2018; originally announced November 2018.

arXiv:1810.00226 [pdf, other]

Toward single particle reconstruction without particle picking: Breaking the detection limit

Authors: Tamir Bendory, Nicolas Boumal, William Leeb, Eitan Levin, Amit Singer

Abstract: Single-particle cryo-electron microscopy (cryo-EM) has recently joined X-ray crystallography and NMR spectroscopy as a high-resolution structural method to resolve biological macromolecules. In a cryo-EM experiment, the microscope produces images called micrographs. Projections of the molecule of interest are embedded in the micrographs at unknown locations, and under unknown viewing directions. S… ▽ More Single-particle cryo-electron microscopy (cryo-EM) has recently joined X-ray crystallography and NMR spectroscopy as a high-resolution structural method to resolve biological macromolecules. In a cryo-EM experiment, the microscope produces images called micrographs. Projections of the molecule of interest are embedded in the micrographs at unknown locations, and under unknown viewing directions. Standard imaging techniques first locate these projections (detection) and then reconstruct the 3-D structure from them. Unfortunately, high noise levels hinder detection. When reliable detection is rendered impossible, the standard techniques fail. This is a problem, especially for small molecules. In this paper, we pursue a radically different approach: we contend that the structure could, in principle, be reconstructed directly from the micrographs, without intermediate detection. The aim is to bring small molecules within reach for cryo-EM. To this end, we design an autocorrelation analysis technique that allows to go directly from the micrographs to the sought structures. This involves only one pass over the micrographs, allowing online, streaming processing for large experiments. We show numerical results and discuss challenges that lay ahead to turn this proof-of-concept into a complementary approach to state-of-the-art algorithms. △ Less

Submitted 27 October, 2022; v1 submitted 29 September, 2018; originally announced October 2018.

Comments: Older citations to this paper refer to version arXiv:1810.00226v1, parts of which now appear in: Tamir Bendory, Nicolas Boumal, William Leeb, Eitan Levin, and Amit Singer. "Multi-target detection with application to cryo-electron microscopy." Inverse Problems 35, no. 10 (2019): 104003

arXiv:1806.03763 [pdf, other]

Smoothed analysis of the low-rank approach for smooth semidefinite programs

Authors: Thomas Pumir, Samy Jelassi, Nicolas Boumal

Abstract: We consider semidefinite programs (SDPs) of size n with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced and positive semidefiniteness is… ▽ More We consider semidefinite programs (SDPs) of size n with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced and positive semidefiniteness is naturally enforced. However, the problem in Y is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided k is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with k scaling like the square root of the number of constraints (up to log factors). Moreover, we bound the optimality gap at the approximate solution of the perturbed problem with respect to the original problem. We particularize our results to an SDP relaxation of phase retrieval. △ Less

Submitted 27 November, 2018; v1 submitted 10 June, 2018; originally announced June 2018.

arXiv:1806.00065 [pdf, other]

doi 10.1007/s10107-020-01505-1

Adaptive regularization with cubics on manifolds

Authors: Naman Agarwal, Nicolas Boumal, Brian Bullins, Coralia Cartis

Abstract: Adaptive regularization with cubics (ARC) is an algorithm for unconstrained, non-convex optimization. Akin to the popular trust-region method, its iterations can be thought of as approximate, safe-guarded Newton steps. For cost functions with Lipschitz continuous Hessian, ARC has optimal iteration complexity, in the sense that it produces an iterate with gradient smaller than $\varepsilon$ in… ▽ More Adaptive regularization with cubics (ARC) is an algorithm for unconstrained, non-convex optimization. Akin to the popular trust-region method, its iterations can be thought of as approximate, safe-guarded Newton steps. For cost functions with Lipschitz continuous Hessian, ARC has optimal iteration complexity, in the sense that it produces an iterate with gradient smaller than $\varepsilon$ in $O(1/\varepsilon^{1.5})$ iterations. For the same price, it can also guarantee a Hessian with smallest eigenvalue larger than $-\varepsilon^{1/2}$. In this paper, we study a generalization of ARC to optimization on Riemannian manifolds. In particular, we generalize the iteration complexity results to this richer framework. Our central contribution lies in the identification of appropriate manifold-specific assumptions that allow us to secure these complexity guarantees both when using the exponential map and when using a general retraction. A substantial part of the paper is devoted to studying these assumptions---relevant beyond ARC---and providing user-friendly sufficient conditions for them. Numerical experiments are encouraging. △ Less

Submitted 16 May, 2020; v1 submitted 31 May, 2018; originally announced June 2018.

Comments: 48 pages, 3 figures

arXiv:1804.02008 [pdf, ps, other]

doi 10.1002/cpa.21830

Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programs

Authors: Nicolas Boumal, Vladislav Voroninski, Afonso S. Bandeira

Abstract: We consider semidefinite programs (SDPs) with equality constraints. The variable to be optimized is a positive semidefinite matrix $X$ of size $n$. Following the Burer--Monteiro approach, we optimize a factor $Y$ of size $n \times p$ instead, such that $X = YY^T$. This ensures positive semidefiniteness at no cost and can reduce the dimension of the problem if $p$ is small, but results in a non-con… ▽ More We consider semidefinite programs (SDPs) with equality constraints. The variable to be optimized is a positive semidefinite matrix $X$ of size $n$. Following the Burer--Monteiro approach, we optimize a factor $Y$ of size $n \times p$ instead, such that $X = YY^T$. This ensures positive semidefiniteness at no cost and can reduce the dimension of the problem if $p$ is small, but results in a non-convex optimization problem with a quadratic cost function and quadratic equality constraints in $Y$. In this paper, we show that if the set of constraints on $Y$ regularly defines a smooth manifold, then, despite non-convexity, first- and second-order necessary optimality conditions are also sufficient, provided $p$ is large enough. For smaller values of $p$, we show a similar result holds for almost all (linear) cost functions. Under those conditions, a global optimum $Y$ maps to a global optimum $X = YY^T$ of the SDP. We deduce old and new consequences for SDP relaxations of the generalized eigenvector problem, the trust-region subproblem and quadratic optimization over several spheres, as well as for the Max-Cut and Orthogonal-Cut SDPs which are common relaxations in stochastic block modeling and synchronization of rotations. △ Less

Submitted 28 May, 2019; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 28 pages, Communications on Pure and Applied Mathematics: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpa.21830

arXiv:1803.00186 [pdf, ps, other]

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form

Authors: Srinadh Bhojanapalli, Nicolas Boumal, Prateek Jain, Praneeth Netrapalli

Abstract: Semidefinite programs (SDP) are important in learning and combinatorial optimization with numerous applications. In pursuit of low-rank solutions and low complexity algorithms, we consider the Burer--Monteiro factorization approach for solving SDPs. We show that all approximate local optima are global optima for the penalty formulation of appropriately rank-constrained SDPs as long as the number o… ▽ More Semidefinite programs (SDP) are important in learning and combinatorial optimization with numerous applications. In pursuit of low-rank solutions and low complexity algorithms, we consider the Burer--Monteiro factorization approach for solving SDPs. We show that all approximate local optima are global optima for the penalty formulation of appropriately rank-constrained SDPs as long as the number of constraints scales sub-quadratically with the desired rank of the optimal solution. Our result is based on a simple penalty function formulation of the rank-constrained SDP along with a smoothed analysis to avoid worst-case cost matrices. We particularize our results to two applications, namely, Max-Cut and matrix completion. △ Less

Submitted 28 February, 2018; originally announced March 2018.

Comments: 24 pages

arXiv:1710.08076 [pdf, other]

3D ab initio modeling in cryo-EM by autocorrelation analysis

Authors: Eitan Levin, Tamir Bendory, Nicolas Boumal, Joe Kileel, Amit Singer

Abstract: Single-Particle Reconstruction (SPR) in Cryo-Electron Microscopy (cryo-EM) is the task of estimating the 3D structure of a molecule from a set of noisy 2D projections, taken from unknown viewing directions. Many algorithms for SPR start from an initial reference molecule, and alternate between refining the estimated viewing angles given the molecule, and refining the molecule given the viewing ang… ▽ More Single-Particle Reconstruction (SPR) in Cryo-Electron Microscopy (cryo-EM) is the task of estimating the 3D structure of a molecule from a set of noisy 2D projections, taken from unknown viewing directions. Many algorithms for SPR start from an initial reference molecule, and alternate between refining the estimated viewing angles given the molecule, and refining the molecule given the viewing angles. This scheme is called iterative refinement. Reliance on an initial, user-chosen reference introduces model bias, and poor initialization can lead to slow convergence. Furthermore, since no ground truth is available for an unsolved molecule, it is difficult to validate the obtained results. This creates the need for high quality ab initio models that can be quickly obtained from experimental data with minimal priors, and which can also be used for validation. We propose a procedure to obtain such an ab initio model directly from raw data using Kam's autocorrelation method. Kam's method has been known since 1980, but it leads to an underdetermined system, with missing orthogonal matrices. Until now, this system has been solved only for special cases, such as highly symmetric molecules or molecules for which a homologous structure was already available. In this paper, we show that knowledge of just two clean projections is sufficient to guarantee a unique solution to the system. This system is solved by an optimization-based heuristic. For the first time, we are then able to obtain a low-resolution ab initio model of an asymmetric molecule directly from raw data, without 2D class averaging and without tilting. Numerical results are presented on both synthetic and experimental data. △ Less

Submitted 7 January, 2018; v1 submitted 22 October, 2017; originally announced October 2017.

arXiv:1710.02590 [pdf, other]

Heterogeneous multireference alignment: a single pass approach

Authors: Nicolas Boumal, Tamir Bendory, Roy R. Lederman, Amit Singer

Abstract: Multireference alignment (MRA) is the problem of estimating a signal from many noisy and cyclically shifted copies of itself. In this paper, we consider an extension called heterogeneous MRA, where $K$ signals must be estimated, and each observation comes from one of those signals, unknown to us. This is a simplified model for the heterogeneity problem notably arising in cryo-electron microscopy.… ▽ More Multireference alignment (MRA) is the problem of estimating a signal from many noisy and cyclically shifted copies of itself. In this paper, we consider an extension called heterogeneous MRA, where $K$ signals must be estimated, and each observation comes from one of those signals, unknown to us. This is a simplified model for the heterogeneity problem notably arising in cryo-electron microscopy. We propose an algorithm which estimates the $K$ signals without estimating either the shifts or the classes of the observations. It requires only one pass over the data and is based on low-order moments that are invariant under cyclic shifts. Given sufficiently many measurements, one can estimate these invariant features averaged over the $K$ signals. We then design a smooth, non-convex optimization problem to compute a set of signals which are consistent with the estimated averaged features. We find that, in many cases, the proposed approach estimates the set of signals accurately despite non-convexity, and conjecture the number of signals $K$ that can be resolved as a function of the signal length $L$ is on the order of $\sqrt{L}$. △ Less

Submitted 31 January, 2018; v1 submitted 6 October, 2017; originally announced October 2017.

Comments: 6 pages, 3 figures

arXiv:1705.00641 [pdf, other]

doi 10.1109/TSP.2017.2775591

Bispectrum Inversion with Application to Multireference Alignment

Authors: Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, Amit Singer

Abstract: We consider the problem of estimating a signal from noisy circularly-translated versions of itself, called multireference alignment (MRA). One natural approach to MRA could be to estimate the shifts of the observations first, and infer the signal by aligning and averaging the data. In contrast, we consider a method based on estimating the signal directly, using features of the signal that are inva… ▽ More We consider the problem of estimating a signal from noisy circularly-translated versions of itself, called multireference alignment (MRA). One natural approach to MRA could be to estimate the shifts of the observations first, and infer the signal by aligning and averaging the data. In contrast, we consider a method based on estimating the signal directly, using features of the signal that are invariant under translations. Specifically, we estimate the power spectrum and the bispectrum of the signal from the observations. Under mild assumptions, these invariant features contain enough information to infer the signal. In particular, the bispectrum can be used to estimate the Fourier phases. To this end, we propose and analyze a few algorithms. Our main methods consist of non-convex optimization over the smooth manifold of phases. Empirically, in the absence of noise, these non-convex algorithms appear to converge to the target signal with random initialization. The algorithms are also robust to noise. We then suggest three additional methods. These methods are based on frequency marching, semidefinite relaxation and integer programming. The first two methods provably recover the phases exactly in the absence of noise. In the high noise level regime, the invariant features approach for MRA results in stable estimation if the number of measurements scales like the cube of the noise variance, which is the information-theoretic rate. Additionally, it requires only one pass over the data which is important at low signal-to-noise ratio when the number of observations must be large. △ Less

Submitted 6 October, 2017; v1 submitted 1 May, 2017; originally announced May 2017.

arXiv:1703.06605 [pdf, other]

doi 10.1137/17M1122025

Near-optimal bounds for phase synchronization

Authors: Yiqiao Zhong, Nicolas Boumal

Abstract: The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector $z$ from their noisy pairwise relative measurements $C = zz^* + σW$, where $W$ is a complex-valued Gaussian random matrix. The maximum likelihood estimator (MLE) is a solution to a unit-modulus constrained quadratic programming problem, which is nonconvex. Existing works have proposed polynomia… ▽ More The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector $z$ from their noisy pairwise relative measurements $C = zz^* + σW$, where $W$ is a complex-valued Gaussian random matrix. The maximum likelihood estimator (MLE) is a solution to a unit-modulus constrained quadratic programming problem, which is nonconvex. Existing works have proposed polynomial-time algorithms such as a semidefinite relaxation (SDP) approach or the generalized power method (GPM) to solve it. Numerical experiments suggest both of these methods succeed with high probability for $σ$ up to $\tilde{\mathcal{O}}(n^{1/2})$, yet, existing analyses only confirm this observation for $σ$ up to $\mathcal{O}(n^{1/4})$. In this paper, we bridge the gap, by proving SDP is tight for $σ= \mathcal{O}(\sqrt{n /\log n})$, and GPM converges to the global optimum under the same regime. Moreover, we establish a linear convergence rate for GPM, and derive a tighter $\ell_\infty$ bound for the MLE. A novel technique we develop in this paper is to track (theoretically) $n$ closely related sequences of iterates, in addition to the sequence of iterates GPM actually produces. As a by-product, we obtain an $\ell_\infty$ perturbation bound for leading eigenvectors. Our result also confirms intuitions that use techniques from statistical mechanics. △ Less

Submitted 20 March, 2017; originally announced March 2017.

Comments: 34 pages, 1 figure

arXiv:1607.08218 [pdf, other]

Non-Convex Phase Retrieval from STFT Measurements

Authors: Tamir Bendory, Yonina C. Eldar, Nicolas Boumal

Abstract: The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short laser pulse characterization and ptychograph… ▽ More The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called Fourier phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short laser pulse characterization and ptychography. The redundancy offered by the STFT enables unique recovery under mild conditions. We show that in some cases the unique solution can be obtained by the principal eigenvector of a matrix, constructed as the solution of a simple least-squares problem. When these conditions are not met, we suggest using the principal eigenvector of this matrix to initialize non-convex local optimization algorithms and propose two such methods. The first is based on minimizing the empirical risk loss function, while the second maximizes a quadratic function on the manifold of phases. We prove that under appropriate conditions, the proposed initialization is close to the underlying signal. We then analyze the geometry of the empirical risk loss function and show numerically that both gradient algorithms converge to the underlying signal even with small redundancy in the measurements. In addition, the algorithms are robust to noise. △ Less

Submitted 24 July, 2017; v1 submitted 27 July, 2016; originally announced July 2016.

arXiv:1606.04970 [pdf, ps, other]

The non-convex Burer-Monteiro approach works on smooth semidefinite programs

Authors: Nicolas Boumal, Vladislav Voroninski, Afonso S. Bandeira

Abstract: Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-conve… ▽ More Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. Although some theory supports this empirical success, a complete explanation of it remains an open question. In this paper, we consider a class of SDPs which includes applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. We show that the low-rank Burer--Monteiro formulation of SDPs in that class almost never has any spurious local optima. △ Less

Submitted 10 April, 2018; v1 submitted 15 June, 2016; originally announced June 2016.

Comments: 19 pages, in the proceedings of NIPS 2016

Journal ref: In proceedings of NIPS 2016: https://papers.nips.cc/paper/6517-the-non-convex-burer-monteiro-approach-works-on-smooth-semidefinite-programs

arXiv:1605.08101 [pdf, ps, other]

doi 10.1093/imanum/drx080

Global rates of convergence for nonconvex optimization on manifolds

Authors: Nicolas Boumal, P. -A. Absil, Coralia Cartis

Abstract: We consider the minimization of a cost function $f$ on a manifold $M$ using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance $\varepsilon$. Specifically, we show that, under Lipschitz-type assumptions on the pullbacks of $f$ to the tangent spaces of $M$, both of these algorithms produce points with Riemannian… ▽ More We consider the minimization of a cost function $f$ on a manifold $M$ using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance $\varepsilon$. Specifically, we show that, under Lipschitz-type assumptions on the pullbacks of $f$ to the tangent spaces of $M$, both of these algorithms produce points with Riemannian gradient smaller than $\varepsilon$ in $O(1/\varepsilon^2)$ iterations. Furthermore, RTR returns a point where also the Riemannian Hessian's least eigenvalue is larger than $-\varepsilon$ in $O(1/\varepsilon^3)$ iterations. There are no assumptions on initialization. The rates match their (sharp) unconstrained counterparts as a function of the accuracy $\varepsilon$ (up to constants) and hence are sharp in that sense. These are the first deterministic results for global rates of convergence to approximate first- and second-order Karush-Kuhn-Tucker points on manifolds. They apply in particular for optimization constrained to compact submanifolds of $\mathbb{R}^n$, under simpler assumptions. △ Less

Submitted 28 April, 2018; v1 submitted 25 May, 2016; originally announced May 2016.

Comments: 33 pages, IMA Journal of Numerical Analysis, 2018

arXiv:1602.04426 [pdf, ps, other]

On the low-rank approach for semidefinite programs arising in synchronization and community detection

Authors: Afonso S. Bandeira, Nicolas Boumal, Vladislav Voroninski

Abstract: To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solv… ▽ More To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic. △ Less

Submitted 27 May, 2016; v1 submitted 14 February, 2016; originally announced February 2016.

Comments: 22 pages, Proceedings of The 29th Conference on Learning Theory (COLT), New York, NY, June 23-26, 2016

arXiv:1601.06114 [pdf, other]

doi 10.1137/16M105808X

Nonconvex phase synchronization

Authors: Nicolas Boumal

Abstract: We estimate $n$ phases (angles) from noisy pairwise relative phase measurements. The task is modeled as a nonconvex least-squares optimization problem. It was recently shown that this problem can be solved in polynomial time via convex relaxation, under some conditions on the noise. In this paper, under similar but more restrictive conditions, we show that a modified version of the power method co… ▽ More We estimate $n$ phases (angles) from noisy pairwise relative phase measurements. The task is modeled as a nonconvex least-squares optimization problem. It was recently shown that this problem can be solved in polynomial time via convex relaxation, under some conditions on the noise. In this paper, under similar but more restrictive conditions, we show that a modified version of the power method converges to the global optimum. This is simpler and (empirically) faster than convex approaches. Empirically, they both succeed in the same regime. Further analysis shows that, in the same noise regime as previously studied, second-order necessary optimality conditions for this quadratically constrained quadratic program are also sufficient, despite nonconvexity. △ Less

Submitted 22 August, 2016; v1 submitted 22 January, 2016; originally announced January 2016.

Comments: 29 pages, 7 figures, to appear in SIAM Journal of Optimization (2016)

arXiv:1506.03295 [pdf, ps, other]

Computational Complexity versus Statistical Performance on Sparse Recovery Problems

Authors: Vincent Roulet, Nicolas Boumal, Alexandre d'Aspremont

Abstract: We show that several classical quantities controlling compressed sensing performance directly match classical parameters controlling algorithmic complexity. We first describe linearly convergent restart schemes on first-order methods solving a broad range of compressed sensing problems, where sharpness at the optimum controls convergence speed. We show that for sparse recovery problems, this sharp… ▽ More We show that several classical quantities controlling compressed sensing performance directly match classical parameters controlling algorithmic complexity. We first describe linearly convergent restart schemes on first-order methods solving a broad range of compressed sensing problems, where sharpness at the optimum controls convergence speed. We show that for sparse recovery problems, this sharpness can be written as a condition number, given by the ratio between true signal sparsity and the largest signal size that can be recovered by the observation matrix. In a similar vein, Renegar's condition number is a data-driven complexity measure for convex programs, generalizing classical condition numbers for linear systems. We show that for a broad class of compressed sensing problems, the worst case value of this algorithmic complexity measure taken over all signals matches the restricted singular value of the observation matrix which controls robust recovery performance. Overall, this means in both cases that, in compressed sensing problems, a single parameter directly controls both computational complexity and recovery performance. Numerical experiments illustrate these points using several classical algorithms. △ Less

Submitted 2 November, 2018; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: Final version, to appear in information and Inference

MSC Class: 68U10; 49K40; 90C25

arXiv:1506.00575 [pdf, other]

A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints

Authors: Nicolas Boumal

Abstract: We propose a new algorithm to solve optimization problems of the form $\min f(X)$ for a smooth function $f$ under the constraints that $X$ is positive semidefinite and the diagonal blocks of $X$ are small identity matrices. Such problems often arise as the result of relaxing a rank constraint (lifting). In particular, many estimation tasks involving phases, rotations, orthonormal bases or permutat… ▽ More We propose a new algorithm to solve optimization problems of the form $\min f(X)$ for a smooth function $f$ under the constraints that $X$ is positive semidefinite and the diagonal blocks of $X$ are small identity matrices. Such problems often arise as the result of relaxing a rank constraint (lifting). In particular, many estimation tasks involving phases, rotations, orthonormal bases or permutations fit in this framework, and so do certain relaxations of combinatorial problems such as Max-Cut. The proposed algorithm exploits the facts that (1) such formulations admit low-rank solutions, and (2) their rank-restricted versions are smooth optimization problems on a Riemannian manifold. Combining insights from both the Riemannian and the convex geometries of the problem, we characterize when second-order critical points of the smooth problem reveal KKT points of the semidefinite problem. We compare against state of the art, mature software and find that, on certain interesting problem instances, what we call the staircase method is orders of magnitude faster, is more accurate and scales better. Code is available. △ Less

Submitted 6 January, 2016; v1 submitted 1 June, 2015; originally announced June 2015.

Comments: 37 pages, 3 figures

arXiv:1411.3272 [pdf, other]

doi 10.1007/s10107-016-1059-6

Tightness of the maximum likelihood semidefinite relaxation for angular synchronization

Authors: Afonso S. Bandeira, Nicolas Boumal, Amit Singer

Abstract: Maximum likelihood estimation problems are, in general, intractable optimization problems. As a result, it is common to approximate the maximum likelihood estimator (MLE) using convex relaxations. In some cases, the relaxation is tight: it recovers the true MLE. Most tightness proofs only apply to situations where the MLE exactly recovers a planted solution (known to the analyst). It is then suffi… ▽ More Maximum likelihood estimation problems are, in general, intractable optimization problems. As a result, it is common to approximate the maximum likelihood estimator (MLE) using convex relaxations. In some cases, the relaxation is tight: it recovers the true MLE. Most tightness proofs only apply to situations where the MLE exactly recovers a planted solution (known to the analyst). It is then sufficient to establish that the optimality conditions hold at the planted signal. In this paper, we study an estimation problem (angular synchronization) for which the MLE is not a simple function of the planted solution, yet for which the convex relaxation is tight. To establish tightness in this context, the proof is less direct because the point at which to verify optimality conditions is not known explicitly. Angular synchronization consists in estimating a collection of $n$ phases, given noisy measurements of the pairwise relative phases. The MLE for angular synchronization is the solution of a (hard) non-bipartite Grothendieck problem over the complex numbers. We consider a stochastic model for the data: a planted signal (that is, a ground truth set of phases) is corrupted with non-adversarial random noise. Even though the MLE does not coincide with the planted signal, we show that the classical semidefinite relaxation for it is tight, with high probability. This holds even for high levels of noise. △ Less

Submitted 19 May, 2016; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: 2 figures

Journal ref: Mathematical Programming, Series A, 2016

arXiv:1410.0719 [pdf, other]

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

Authors: L. Jacques, C. De Vleeschouwer, Y. Boursier, P. Sudhakar, C. De Mol, A. Pizurica, S. Anthoine, P. Vandergheynst, P. Frossard, C. Bilen, S. Kitic, N. Bertin, R. Gribonval, N. Boumal, B. Mishra, P. -A. Absil, R. Sepulchre, S. Bundervoet, C. Schretter, A. Dooms, P. Schelkens, O. Chabiron, F. Malgouyres, J. -Y. Tourneret, N. Dobigeon , et al. (42 additional authors not shown)

Abstract: The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in… ▽ More The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference. △ Less

Submitted 9 October, 2014; v1 submitted 2 October, 2014; originally announced October 2014.

Comments: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist14

arXiv:1308.5200 [pdf, ps, other]

Manopt, a Matlab toolbox for optimization on manifolds

Authors: Nicolas Boumal, Bamdev Mishra, P. -A. Absil, Rodolphe Sepulchre

Abstract: Optimization on manifolds is a rapidly develo** branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design efficient numerical algorithms. In particular, optimization on manifolds is well-suited to deal with rank and orthogonality constraints. Such structured constraints appear pervasively in machine learning applications,… ▽ More Optimization on manifolds is a rapidly develo** branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design efficient numerical algorithms. In particular, optimization on manifolds is well-suited to deal with rank and orthogonality constraints. Such structured constraints appear pervasively in machine learning applications, including low-rank matrix completion, sensor network localization, camera network registration, independent component analysis, metric learning, dimensionality reduction and so on. The Manopt toolbox, available at www.manopt.org, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms. We aim particularly at reaching practitioners outside our field. △ Less

Submitted 23 August, 2013; originally announced August 2013.

Journal ref: The Journal of Machine Learning Research, 15(1), 1455-1459 (2014)

arXiv:1307.6398 [pdf, other]

doi 10.1016/j.sysconle.2014.10.006

Concentration of the Kirchhoff index for Erdos-Renyi graphs

Authors: Nicolas Boumal, Xiuyuan Cheng

Abstract: Given an undirected graph, the resistance distance between two nodes is the resistance one would measure between these two nodes in an electrical network if edges were resistors. Summing these distances over all pairs of nodes yields the so-called Kirchhoff index of the graph, which measures its overall connectivity. In this work, we consider Erdos-Renyi random graphs. Since the graphs are random,… ▽ More Given an undirected graph, the resistance distance between two nodes is the resistance one would measure between these two nodes in an electrical network if edges were resistors. Summing these distances over all pairs of nodes yields the so-called Kirchhoff index of the graph, which measures its overall connectivity. In this work, we consider Erdos-Renyi random graphs. Since the graphs are random, their Kirchhoff indices are random variables. We give formulas for the expected value of the Kirchhoff index and show it concentrates around its expectation. We achieve this by studying the trace of the pseudoinverse of the Laplacian of Erdos-Renyi graphs. For synchronization (a class of estimation problems on graphs) our results imply that acquiring pairwise measurements uniformly at random is a good strategy, even if only a vanishing proportion of the measurements can be acquired. △ Less

Submitted 29 May, 2014; v1 submitted 24 July, 2013; originally announced July 2013.

arXiv:1211.1621 [pdf, other]

doi 10.1093/imaiai/iat006

Cramér-Rao bounds for synchronization of rotations

Authors: Nicolas Boumal, Amit Singer, P. -A. Absil, Vincent D. Blondel

Abstract: Synchronization of rotations is the problem of estimating a set of rotations R_i in SO(n), i = 1, ..., N, based on noisy measurements of relative rotations R_i R_j^T. This fundamental problem has found many recent applications, most importantly in structural biology. We provide a framework to study synchronization as estimation on Riemannian manifolds for arbitrary n under a large family of noise… ▽ More Synchronization of rotations is the problem of estimating a set of rotations R_i in SO(n), i = 1, ..., N, based on noisy measurements of relative rotations R_i R_j^T. This fundamental problem has found many recent applications, most importantly in structural biology. We provide a framework to study synchronization as estimation on Riemannian manifolds for arbitrary n under a large family of noise models. The noise models we address encompass zero-mean isotropic noise, and we develop tools for Gaussian-like as well as heavy-tail types of noise in particular. As a main contribution, we derive the Cramér-Rao bounds of synchronization, that is, lower-bounds on the variance of unbiased estimators. We find that these bounds are structured by the pseudoinverse of the measurement graph Laplacian, where edge weights are proportional to measurement quality. We leverage this to provide interpretation in terms of random walks and visualization tools for these bounds in both the anchored and anchor-free scenarios. Similar bounds previously established were limited to rotations in the plane and Gaussian-like noise. △ Less

Submitted 4 July, 2013; v1 submitted 7 November, 2012; originally announced November 2012.

MSC Class: 62F99; 94C15; 22C05; 05C12

Journal ref: Information and Inference, 3(1), 1-39 (2014)

arXiv:1101.3615 [pdf, other]

Matrix probing: a randomized preconditioner for the wave-equation Hessian

Authors: Laurent Demanet, Pierre-David Létourneau, Nicolas Boumal, Henri Calandra, Jiawei Chiu, Stanley Snelson

Abstract: This paper considers the problem of approximating the inverse of the wave-equation Hessian, also called normal operator, in seismology and other types of wave-based imaging. An expansion scheme for the pseudodifferential symbol of the inverse Hessian is set up. The coefficients in this expansion are found via least-squares fitting from a certain number of applications of the normal operator on ade… ▽ More This paper considers the problem of approximating the inverse of the wave-equation Hessian, also called normal operator, in seismology and other types of wave-based imaging. An expansion scheme for the pseudodifferential symbol of the inverse Hessian is set up. The coefficients in this expansion are found via least-squares fitting from a certain number of applications of the normal operator on adequate randomized trial functions built in curvelet space. It is found that the number of parameters that can be fitted increases with the amount of information present in the trial functions, with high probability. Once an approximate inverse Hessian is available, application to an image of the model can be done in very low complexity. Numerical experiments show that randomized operator fitting offers a compelling preconditioner for the linearized seismic inversion problem. △ Less

Submitted 19 January, 2011; originally announced January 2011.

Comments: 21 pages, 6 figures

Showing 1–43 of 43 results for author: Boumal, N