Search | arXiv e-print repository

Computing second-order points under equality constraints: revisiting Fletcher's augmented Lagrangian

Authors: Florentin Goyens, Armin Eftekhari, Nicolas Boumal

Abstract: We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher's augmented Lagrangian, we propose an algorithm to minimize t… ▽ More We address the problem of minimizing a smooth function under smooth equality constraints. Under regularity assumptions on these constraints, we propose a notion of approximate first- and second-order critical point which relies on the geometric formalism of Riemannian optimization. Using a smooth exact penalty function known as Fletcher's augmented Lagrangian, we propose an algorithm to minimize the penalized cost function which reaches $\varepsilon$-approximate second-order critical points of the original optimization problem in at most $\mathcal{O}(\varepsilon^{-3})$ iterations. This improves on current best theoretical bounds. Along the way, we show new properties of Fletcher's augmented Lagrangian, which may be of independent interest. △ Less

Submitted 16 January, 2024; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2202.06555 [pdf, other]

High-Dimensional Dynamic Stochastic Model Representation

Authors: Aryan Eftekhari, Simon Scheidegger

Abstract: We propose a scalable method for computing global solutions of nonlinear, high-dimensional dynamic stochastic economic models. First, within a time iteration framework, we approximate economic policy functions using an adaptive, high-dimensional model representation scheme, combined with adaptive sparse grids to address the ubiquitous challenge of the curse of dimensionality. Moreover, the adaptiv… ▽ More We propose a scalable method for computing global solutions of nonlinear, high-dimensional dynamic stochastic economic models. First, within a time iteration framework, we approximate economic policy functions using an adaptive, high-dimensional model representation scheme, combined with adaptive sparse grids to address the ubiquitous challenge of the curse of dimensionality. Moreover, the adaptivity within the individual component functions increases sparsity since grid points are added only where they are most needed, that is, in regions with steep gradients or at nondifferentiabilities. Second, we introduce a performant vectorization scheme for the interpolation compute kernel. Third, the algorithm is hybrid parallelized, leveraging both distributed- and shared-memory architectures. We observe significant speedups over the state-of-the-art techniques, and almost ideal strong scaling up to at least $1,000$ compute nodes of a Cray XC$50$ system at the Swiss National Supercomputing Center. Finally, to demonstrate our method's broad applicability, we compute global solutions to two variates of a high-dimensional international real business cycle model up to $300$ continuous state variables. In addition, we highlight a complementary advantage of the framework, which allows for a priori analysis of the model complexity. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.09096 [pdf, ps, other]

The Forward-Backward Envelope for Sampling with the Overdamped Langevin Algorithm

Authors: Armin Eftekhari, Luis Vargas, Konstantinos Zygalakis

Abstract: In this paper, we analyse a proximal method based on the idea of forward-backward splitting for sampling from distributions with densities that are not necessarily smooth. In particular, we study the non-asymptotic properties of the Euler-Maruyama discretization of the Langevin equation, where the forward-backward envelope is used to deal with the non-smooth part of the dynamics. An advantage of t… ▽ More In this paper, we analyse a proximal method based on the idea of forward-backward splitting for sampling from distributions with densities that are not necessarily smooth. In particular, we study the non-asymptotic properties of the Euler-Maruyama discretization of the Langevin equation, where the forward-backward envelope is used to deal with the non-smooth part of the dynamics. An advantage of this envelope, when compared to widely-used Moreu-Yoshida one and the MYULA algorithm, is that it maintains the MAP estimator of the original non-smooth distribution. We also study a number of numerical experiments that corroborate that support our theoretical findings. △ Less

Submitted 22 January, 2022; originally announced January 2022.

arXiv:2112.13269 [pdf, other]

doi 10.1109/TSP.2021.3139213

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Authors: Armin Eftekhari

Abstract: Motivated by the emerging role of interpolating machines in signal processing and machine learning, this work considers the computational aspects of over-parametrized matrix factorization. In this context, the optimization landscape may contain spurious stationary points (SSPs), which are proved to be full-rank matrices. The presence of these SSPs means that it is impossible to hope for any global… ▽ More Motivated by the emerging role of interpolating machines in signal processing and machine learning, this work considers the computational aspects of over-parametrized matrix factorization. In this context, the optimization landscape may contain spurious stationary points (SSPs), which are proved to be full-rank matrices. The presence of these SSPs means that it is impossible to hope for any global guarantees in over-parametrized matrix factorization. For example, when initialized at an SSP, the gradient flow will be trapped there forever. Nevertheless, despite these SSPs, we establish in this work that the gradient flow of the corresponding merit function converges to a global minimizer, provided that its initialization is rank-deficient and sufficiently close to the feasible set of the optimization problem. We numerically observe that a heuristic discretization of the proposed gradient flow, inspired by primal-dual algorithms, is successful when initialized randomly. Our result is in sharp contrast with the local refinement methods which require an initialization close to the optimal set of the optimization problem. More specifically, we successfully avoid the traps set by the SSPs because the gradient flow remains rank-deficient at all times, and not because there are no SSPs nearby. The latter is the case for the local refinement methods. Moreover, the widely-used restricted isometry property plays no role in our main result. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2111.01875 [pdf, other]

Subquadratic Overparameterization for Shallow Neural Networks

Authors: Chaehwan Song, Ali Ramezani-Kebrya, Thomas Pethick, Armin Eftekhari, Volkan Cevher

Abstract: Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence using various initialization strategies, training modifications, and width scalings. In particular, the state-of-the-art results require the width to scale qua… ▽ More Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence using various initialization strategies, training modifications, and width scalings. In particular, the state-of-the-art results require the width to scale quadratically with the number of training data under standard initialization strategies used in practice for best generalization performance. In contrast, the most recent results obtain linear scaling either with requiring initializations that lead to the "lazy-training", or training only a single layer. In this work, we provide an analytical framework that allows us to adopt standard initialization strategies, possibly avoid lazy training, and train all layers simultaneously in basic shallow neural networks while attaining a desirable subquadratic scaling on the network width. We achieve the desiderata via Polyak-Lojasiewicz condition, smoothness, and standard assumptions on data, and use tools from random matrix theory. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: To appear at the conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2109.06095 [pdf, other]

Nonlinear matrix recovery using optimization on the Grassmann manifold

Authors: Florentin Goyens, Coralia Cartis, Armin Eftekhari

Abstract: We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem in… ▽ More We investigate the problem of recovering a partially observed high-rank matrix whose columns obey a nonlinear structure such as a union of subspaces, an algebraic variety or grouped in clusters. The recovery problem is formulated as the rank minimization of a nonlinear feature map applied to the original matrix, which is then further approximated by a constrained non-convex optimization problem involving the Grassmann manifold. We propose two sets of algorithms, one arising from Riemannian optimization and the other as an alternating minimization scheme, both of which include first- and second-order variants. Both sets of algorithms have theoretical guarantees. In particular, for the alternating minimization, we establish global convergence and worst-case complexity bounds. Additionally, using the Kurdyka-Lojasiewicz property, we show that the alternating minimization converges to a unique limit point. We provide extensive numerical results for the recovery of union of subspaces and clustering under entry sampling and dense Gaussian sampling. Our methods are competitive with existing approaches and, in particular, high accuracy is achieved in the recovery using Riemannian second-order methods. △ Less

Submitted 8 December, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Fixed some typos in version 2

arXiv:2105.12022 [pdf, other]

Principal Component Hierarchy for Sparse Quadratic Programs

Authors: Robbie Vreugdenhil, Viet Anh Nguyen, Armin Eftekhari, Peyman Mohajerin Esfahani

Abstract: We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we p… ▽ More We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we propose two scalable optimization algorithms, coined as the "best response" and the "dual program", that can efficiently screen the potential indices of the nonzero elements of the original program. We show that the proposed methods are competitive with the existing screening methods in the current sparse regression literature, and it is particularly fast on instances with high number of measurements in experiments with both synthetic and real datasets. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Journal ref: ICML 2021

arXiv:2101.02776 [pdf, other]

The Nonconvex Geometry of Linear Inverse Problems

Authors: Armin Eftekhari, Peyman Mohajerin Esfahani

Abstract: The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the con… ▽ More The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the context of linear inverse problems, we pinpoint the source of its success, but also argue that the applicability of the gauge function is inherently limited by its convexity, and showcase several learning problems where the classical gauge function theory fails. We then introduce a new notion of statistical complexity, gauge$_p$ function, which overcomes the limitations of the gauge function. The gauge$_p$ function is a simple generalization of the gauge function that can tightly control the sparsity of a statistical model within the learning alphabet and, perhaps surprisingly, draws further inspiration from the Burer-Monteiro factorization in computational mathematics. We also propose a new learning machine, with the building block of gauge$_p$ function, and arm this machine with a number of statistical guarantees. The potential of the proposed gauge$_p$ function theory is then studied for two stylized applications. Finally, we discuss the computational aspects and, in particular, suggest a tractable numerical algorithm for implementing the new learning machine. △ Less

Submitted 9 March, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

arXiv:2008.12091 [pdf, other]

Limitations of Implicit Bias in Matrix Sensing: Initialization Rank Matters

Authors: Armin Eftekhari, Konstantinos Zygalakis

Abstract: In matrix sensing, we first numerically identify the sensitivity to the initialization rank as a new limitation of the implicit bias of gradient flow. We will partially quantify this phenomenon mathematically, where we establish that the gradient flow of the empirical risk is implicitly biased towards low-rank outcomes and successfully learns the planted low-rank matrix, provided that the initiali… ▽ More In matrix sensing, we first numerically identify the sensitivity to the initialization rank as a new limitation of the implicit bias of gradient flow. We will partially quantify this phenomenon mathematically, where we establish that the gradient flow of the empirical risk is implicitly biased towards low-rank outcomes and successfully learns the planted low-rank matrix, provided that the initialization is low-rank and within a specific "capture neighborhood". This capture neighborhood is far larger than the corresponding neighborhood in local refinement results; the former contains all models with zero training error whereas the latter is a small neighborhood of a model with zero test error. These new insights enable us to design an alternative algorithm for matrix sensing that complements the high-rank and near-zero initialization scheme which is predominant in the existing literature. △ Less

Submitted 6 June, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

arXiv:2007.01147 [pdf, ps, other]

Double-Loop Unadjusted Langevin Algorithm

Authors: Paul Rolland, Armin Eftekhari, Ali Kavis, Volkan Cevher

Abstract: A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guarantees for sampling from a smooth log-concave distribution, which are not covered by existing state-of-the-art convergence guarantees. To establish this result, we… ▽ More A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guarantees for sampling from a smooth log-concave distribution, which are not covered by existing state-of-the-art convergence guarantees. To establish this result, we derive a new theoretical bound that relates the Wasserstein distance to total variation distance between any two log-concave distributions that complements the reach of Talagrand T2 inequality. Moreover, applying this new step size schedule to an existing constrained sampling algorithm, we show state-of-the-art convergence rates for sampling from a constrained log-concave distribution, as well as improved dimension dependence. △ Less

Submitted 2 July, 2020; originally announced July 2020.

arXiv:2002.09852 [pdf, other]

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Authors: Armin Eftekhari

Abstract: Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this… ▽ More Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019;Arora et al., 2018a). Crucially, our results appear to be the first to break away from the lazy training regime which has dominated the literature of neural networks. This work requires the network to have a layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem. △ Less

Submitted 28 June, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

arXiv:1910.03948

Nearly Minimal Over-Parametrization of Shallow Neural Networks

Authors: Armin Eftekhari, ChaeHwan Song, Volkan Cevher

Abstract: A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrizati… ▽ More A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems. △ Less

Submitted 29 October, 2019; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: This paper is submitted without consent of the co-authors

arXiv:1907.03343 [pdf, other]

Fast and Provable ADMM for Learning with Generative Priors

Authors: Fabian Latorre Gómez, Armin Eftekhari, Volkan Cevher

Abstract: In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (G… ▽ More In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios. △ Less

Submitted 7 July, 2019; originally announced July 2019.

arXiv:1906.11357 [pdf, other]

An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints

Authors: Mehmet Fatih Sahin, Armin Eftekhari, Ahmet Alacaoglu, Fabian Latorre, Volkan Cevher

Abstract: We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, which is closely related to the Polyak-Lojasiewicz and Mangasarian-Fromowitz conditions. In particular, when a first-order solver is used for the inner iterates, we prove… ▽ More We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, which is closely related to the Polyak-Lojasiewicz and Mangasarian-Fromowitz conditions. In particular, when a first-order solver is used for the inner iterates, we prove that iALM finds a first-order stationary point with $\tilde{\mathcal{O}}(1/ε^4)$ calls to the first-order oracle. If, in addition, the problem is smooth and a second-order solver is used for the inner iterates, iALM finds a second-order stationary point with $\tilde{\mathcal{O}}(1/ε^5)$ calls to the second-order oracle, which matches the known theoretical complexity result in the literature. We also provide strong numerical evidence on large-scale machine learning problems, including the Burer-Monteiro factorization of semidefinite programs, and a novel nonconvex relaxation of the standard basis pursuit template. For these examples, we also show how to verify our geometric condition. △ Less

Submitted 20 April, 2022; v1 submitted 26 June, 2019; originally announced June 2019.

Journal ref: In proceedings of NeurIPS 2019, pages 13943-13955, volume 32: http://papers.nips.cc/paper/9545-an-inexact-augmented-lagrangian-framework-for-nonconvex-optimization-with-nonlinear-constraints

arXiv:1902.00386 [pdf, other]

Scalable Learning-Based Sampling Optimization for Compressive Dynamic MRI

Authors: Thomas Sanchez, Baran Gözcü, Ruud B. van Heeswijk, Armin Eftekhari, Efe Ilıcak, Tolga Çukur, Volkan Cevher

Abstract: Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardin… ▽ More Compressed sensing applied to magnetic resonance imaging (MRI) allows to reduce the scanning time by enabling images to be reconstructed from highly undersampled data. In this paper, we tackle the problem of designing a sampling mask for an arbitrary reconstruction method and a limited acquisition budget. Namely, we look for an optimal probability distribution from which a mask with a fixed cardinality is drawn. We demonstrate that this problem admits a compactly supported solution, which leads to a deterministic optimal sampling mask. We then propose a stochastic greedy algorithm that (i) provides an approximate solution to this problem, and (ii) resolves the scaling issues of [1,2]. We validate its performance on in vivo dynamic MRI with retrospective undersampling, showing that our method preserves the performance of [1,2] while reducing the computational burden by a factor close to 200. △ Less

Submitted 16 March, 2020; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: 13 pages, 16 figures, ICASSP 2020 - Session on "Learning and Optimization in Non-Convex Environments". Code available at https://github.com/t-sanchez/stochasticGreedyMRI.git

arXiv:1806.01304 [pdf, other]

doi 10.1109/TPAMI.2019.2919597

MOSES: A Streaming Algorithm for Linear Dimensionality Reduction

Authors: Armin Eftekhari, Raphael A. Hauser, Andreas Grammenos

Abstract: This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of streaming data and reducing its dimension. More specifically, in various applications such as sensor networks, the data vectors are presented sequentially to a user who has limited storage and processing time available. Applied to such problems, MOSES can provide a running… ▽ More This paper introduces Memory-limited Online Subspace Estimation Scheme (MOSES) for both estimating the principal components of streaming data and reducing its dimension. More specifically, in various applications such as sensor networks, the data vectors are presented sequentially to a user who has limited storage and processing time available. Applied to such problems, MOSES can provide a running estimate of leading principal components of the data that has arrived so far and also reduce its dimension. MOSES generalises the popular incremental Singular Vale Decomposition (iSVD) to handle thin blocks of data, rather than just vectors. This minor generalisation in part allows us to complement MOSES with a comprehensive statistical analysis, thus providing the first theoretically-sound variant of iSVD, which has been lacking despite the empirical success of this method. This generalisation also enables us to concretely interpret MOSES as an approximate solver for the underlying non-convex optimisation program. We find that MOSES consistently surpasses the state of the art in our numerical experiments with both synthetic and real-world datasets, while being computationally inexpensive. △ Less

Submitted 23 February, 2020; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.09513 [pdf, other]

Stable Super-Resolution of Images: A Theoretical Study

Authors: Armin Eftekhari, Tamir Bendory, Gongguo Tang

Abstract: We study the ubiquitous super-resolution problem, in which one aims at localizing positive point sources in an image, blurred by the point spread function of the imaging device. To recover the point sources, we propose to solve a convex feasibility program, which simply finds a nonnegative Borel measure that agrees with the observations collected by the imaging device. In the absence of imaging… ▽ More We study the ubiquitous super-resolution problem, in which one aims at localizing positive point sources in an image, blurred by the point spread function of the imaging device. To recover the point sources, we propose to solve a convex feasibility program, which simply finds a nonnegative Borel measure that agrees with the observations collected by the imaging device. In the absence of imaging noise, we show that solving this convex program uniquely retrieves the point sources, provided that the imaging device collects enough observations. This result holds true if the point spread function of the imaging device can be decomposed into horizontal and vertical components, and if the translations of these components form a Chebyshev system, i.e., a system of continuous functions that loosely behave like algebraic polynomials. Building upon recent results for one-dimensional signals [1], we prove that this super-resolution algorithm is stable, in the generalized Wasserstein metric, to model mismatch (i.e., when the image is not sparse) and to additive imaging noise. In particular, the recovery error depends on the noise level and how well the image can be approximated with well-separated point sources. As an example, we verify these claims for the important case of a Gaussian point spread function. The proofs rely on the construction of novel interpolating polynomials---which are the main technical contribution of this paper---and partially resolve the question raised in [2] about the extension of the standard machinery to higher dimensions. △ Less

Submitted 5 September, 2020; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1805.07459 [pdf, other]

PCA by Optimisation of Symmetric Functions has no Spurious Local Optima

Authors: Raphael A. Hauser, Armin Eftekhari

Abstract: Principal Component Analysis (PCA) finds the best linear representation of data, and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. This paper introduces many other w… ▽ More Principal Component Analysis (PCA) finds the best linear representation of data, and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. This paper introduces many other ways of performing PCA, with various geometric interpretations, and proves that the corresponding family of non-convex programs have no spurious local optima, while possessing only strict saddle points. These programs therefore loosely behave like convex problems and can be efficiently solved to global optimality, for example, with certain variants of the stochastic gradient descent. Beyond providing new geometric interpretations and enhancing our theoretical understanding of PCA, our findings might pave the way for entirely new approaches to structured dimensionality reduction, such as sparse PCA and nonnegative matrix factorisation. More specifically, we study an unconstrained formulation of PCA using determinant optimisation that might provide an elegant alternative to the deflating scheme commonly used in sparse PCA. △ Less

Submitted 21 December, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1805.07199 [pdf, other]

Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation

Authors: Armin Eftekhari, Bart Vandereycken, Gilles Vilmart, Konstantinos C. Zygalakis

Abstract: This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper p… ▽ More This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper proves that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RKCD diminishes as the condition number of the quadratic function worsens. It is established that this optimal rate is obtained also for a partitioned variant of RKCD applied to perturbations of quadratic functions. In addition, numerical experiments on general strongly convex problems show that RKCD outperforms Nesterov's accelerated gradient descent. △ Less

Submitted 27 June, 2020; v1 submitted 18 May, 2018; originally announced May 2018.

MSC Class: 65K10; 65N12

arXiv:1804.10243 [pdf, other]

Sparse Inverse Problems Over Measures: Equivalence of the Conditional Gradient and Exchange Methods

Authors: Armin Eftekhari, Andrew Thompson

Abstract: We study an optimization program over nonnegative Borel measures that encourages sparsity in its solution. Efficient solvers for this program are in increasing demand, as it arises when learning from data generated by a `continuum-of-subspaces' model, a recent trend with applications in signal processing, machine learning, and high-dimensional statistics. We prove that the conditional gradient met… ▽ More We study an optimization program over nonnegative Borel measures that encourages sparsity in its solution. Efficient solvers for this program are in increasing demand, as it arises when learning from data generated by a `continuum-of-subspaces' model, a recent trend with applications in signal processing, machine learning, and high-dimensional statistics. We prove that the conditional gradient method (CGM) applied to this infinite-dimensional program, as proposed recently in the literature, is equivalent to the exchange method (EM) applied to its Lagrangian dual, which is a semi-infinite program. In doing so, we formally connect such infinite-dimensional programs to the well-established field of semi-infinite programming. On the one hand, the equivalence established in this paper allows us to provide a rate of convergence for EM which is more general than those existing in the literature. On the other hand, this connection and the resulting geometric insights might in the future lead to the design of improved variants of CGM for infinite-dimensional programs, which has been an active research topic. CGM is also known as the Frank-Wolfe algorithm. △ Less

Submitted 5 March, 2019; v1 submitted 26 April, 2018; originally announced April 2018.

arXiv:1804.01490 [pdf, ps, other]

Sparse non-negative super-resolution -- simplified and stabilised

Authors: Armin Eftekhari, Jared Tanner, Andrew Thompson, Bogdan Toader, Hemant Tyagi

Abstract: The convolution of a discrete measure, $x=\sum_{i=1}^ka_iδ_{t_i}$, with a local window function, $φ(s-t)$, is a common model for a measurement device whose resolution is substantially lower than that of the objects being observed. Super-resolution concerns localising the point sources $\{a_i,t_i\}_{i=1}^k$ with an accuracy beyond the essential support of $φ(s-t)$, typically from $m$ samples… ▽ More The convolution of a discrete measure, $x=\sum_{i=1}^ka_iδ_{t_i}$, with a local window function, $φ(s-t)$, is a common model for a measurement device whose resolution is substantially lower than that of the objects being observed. Super-resolution concerns localising the point sources $\{a_i,t_i\}_{i=1}^k$ with an accuracy beyond the essential support of $φ(s-t)$, typically from $m$ samples $y(s_j)=\sum_{i=1}^k a_iφ(s_j-t_i)+η_j$, where $η_j$ indicates an inexactness in the sample value. We consider the setting of $x$ being non-negative and seek to characterise all non-negative measures approximately consistent with the samples. We first show that $x$ is the unique non-negative measure consistent with the samples provided the samples are exact, i.e. $η_j=0$, $m\ge 2k+1$ samples are available, and $φ(s-t)$ generates a Chebyshev system. This is independent of how close the sample locations are and {\em does not rely on any regulariser beyond non-negativity}; as such, it extends and clarifies the work by Schiebinger et al. and De Castro et al., who achieve the same results but require a total variation regulariser, which we show is unnecessary. Moreover, we characterise non-negative solutions $\hat{x}$ consistent with the samples within the bound $\sum_{j=1}^mη_j^2\le δ^2$. Any such non-negative measure is within ${\mathcal O}(δ^{1/7})$ of the discrete measure $x$ generating the samples in the generalised Wasserstein distance, converging to one another as $δ$ approaches zero. We also show how to make these general results, for windows that form a Chebyshev system, precise for the case of $φ(s-t)$ being a Gaussian window. The main innovation of these results is that non-negativity alone is sufficient to localise point sources beyond the essential sensor resolution. △ Less

Submitted 26 November, 2019; v1 submitted 4 April, 2018; originally announced April 2018.

Comments: 59 pages, 7 figures

arXiv:1803.04049 [pdf, other]

PCA by Determinant Optimization has no Spurious Local Optima

Authors: Raphael A. Hauser, Armin Eftekhari, Heinrich F. Matzinger

Abstract: Principal component analysis (PCA) is an indispensable tool in many learning tasks that finds the best linear representation for data. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other ways of interpreting PCA that… ▽ More Principal component analysis (PCA) is an indispensable tool in many learning tasks that finds the best linear representation for data. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other ways of interpreting PCA that are rarely exploited in practice, largely because it is not known how to reliably solve the corresponding non-convex optimisation programs. In this paper, we consider one such interpretation of principal components as the directions that preserve most of the "volume" of the dataset. Our main contribution is a theorem that shows that the corresponding non-convex program has no spurious local optima. We apply a number of solvers for empirical confirmation. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:1612.06339 [pdf, other]

Randomized Learning of the Second-Moment Matrix of a Smooth Function

Authors: Armin Eftekhari, Michael B. Wakin, ** Li, Paul G. Constantine

Abstract: Consider an open set $\mathbb{D}\subseteq\mathbb{R}^n$, equipped with a probability measure $μ$. An important characteristic of a smooth function $f:\mathbb{D}\rightarrow\mathbb{R}$ is its \emph{second-moment matrix} $Σ_μ:=\int \nabla f(x) \nabla f(x)^* μ(dx) \in\mathbb{R}^{n\times n}$, where $\nabla f(x)\in\mathbb{R}^n$ is the gradient of $f(\cdot)$ at $x\in\mathbb{D}$ and $*$ stands for transpos… ▽ More Consider an open set $\mathbb{D}\subseteq\mathbb{R}^n$, equipped with a probability measure $μ$. An important characteristic of a smooth function $f:\mathbb{D}\rightarrow\mathbb{R}$ is its \emph{second-moment matrix} $Σ_μ:=\int \nabla f(x) \nabla f(x)^* μ(dx) \in\mathbb{R}^{n\times n}$, where $\nabla f(x)\in\mathbb{R}^n$ is the gradient of $f(\cdot)$ at $x\in\mathbb{D}$ and $*$ stands for transpose. For instance, the span of the leading $r$ eigenvectors of $Σ_μ$ forms an \emph{active subspace} of $f(\cdot)$, which contains the directions along which $f(\cdot)$ changes the most and is of particular interest in \emph{ridge approximation}. In this work, we propose a simple algorithm for estimating $Σ_μ$ from random point evaluations of $f(\cdot)$ \emph{without} imposing any structural assumptions on $Σ_μ$. Theoretical guarantees for this algorithm are established with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements. △ Less

Submitted 8 September, 2019; v1 submitted 19 December, 2016; originally announced December 2016.

arXiv:1612.01720 [pdf, other]

Weighted Matrix Completion and Recovery with Prior Subspace Information

Authors: Armin Eftekhari, Dehui Yang, Michael B. Wakin

Abstract: An incoherent low-rank matrix can be efficiently reconstructed after observing a few of its entries at random, and then solving a convex program that minimizes the nuclear norm. In many applications, in addition to these entries, potentially valuable prior knowledge about the column and row spaces of the matrix is also available to the practitioner. In this paper, we incorporate this prior knowled… ▽ More An incoherent low-rank matrix can be efficiently reconstructed after observing a few of its entries at random, and then solving a convex program that minimizes the nuclear norm. In many applications, in addition to these entries, potentially valuable prior knowledge about the column and row spaces of the matrix is also available to the practitioner. In this paper, we incorporate this prior knowledge in matrix completion---by minimizing a weighted nuclear norm---and precisely quantify any improvements. In particular, we find in theory that reliable prior knowledge reduces the sample complexity of matrix completion by a logarithmic factor, and the observed improvement in numerical simulations is considerably more magnified. We also present similar results for the closely related problem of matrix recovery from generic linear measurements. △ Less

Submitted 12 March, 2018; v1 submitted 6 December, 2016; originally announced December 2016.

arXiv:1612.00904 [pdf, other]

Streaming Principal Component Analysis From Incomplete Data

Authors: Armin Eftekhari, Gregory Ongie, Laura Balzano, Michael B. Wakin

Abstract: Linear subspace models are pervasive in computational sciences and particularly used for large datasets which are often incomplete due to privacy issues or sampling constraints. Therefore, a critical problem is develo** an efficient algorithm for detecting low-dimensional linear structure from incomplete data efficiently, in terms of both computational complexity and storage. In this paper we… ▽ More Linear subspace models are pervasive in computational sciences and particularly used for large datasets which are often incomplete due to privacy issues or sampling constraints. Therefore, a critical problem is develo** an efficient algorithm for detecting low-dimensional linear structure from incomplete data efficiently, in terms of both computational complexity and storage. In this paper we propose a streaming subspace estimation algorithm called Subspace Navigation via Interpolation from Partial Entries (SNIPE) that efficiently processes blocks of incomplete data to estimate the underlying subspace model. In every iteration, SNIPE finds the subspace that best fits the new data block but remains close to the previous estimate. We show that SNIPE is a streaming solver for the underlying nonconvex matrix completion problem, that it converges globally to a stationary point of this program regardless of initialization, and that the convergence is locally linear with high probability. We also find that SNIPE shows state-of-the-art performance in our numerical simulations. △ Less

Submitted 21 May, 2018; v1 submitted 2 December, 2016; originally announced December 2016.

arXiv:1611.07216 [pdf, other]

doi 10.1109/LSP.2017.2684784

What to Expect When You Are Expecting on the Grassmannian

Authors: Armin Eftekhari, Laura Balzano, Michael B. Wakin

Abstract: Consider an incoming sequence of vectors, all belonging to an unknown subspace $\operatorname{S}$, and each with many missing entries. In order to estimate $\operatorname{S}$, it is common to partition the data into blocks and iteratively update the estimate of $\operatorname{S}$ with each new incoming measurement block. In this paper, we investigate a rather basic question: Is it possible to id… ▽ More Consider an incoming sequence of vectors, all belonging to an unknown subspace $\operatorname{S}$, and each with many missing entries. In order to estimate $\operatorname{S}$, it is common to partition the data into blocks and iteratively update the estimate of $\operatorname{S}$ with each new incoming measurement block. In this paper, we investigate a rather basic question: Is it possible to identify $\operatorname{S}$ by averaging the column span of the partially observed incoming measurement blocks on the Grassmannian? We show that in general the span of the incoming blocks is in fact a biased estimator of $\operatorname{S}$ when data suffers from erasures, and we find an upper bound for this bias. We reach this conclusion by examining the defining optimization program for the Fréchet expectation on the Grassmannian, and with the aid of a sharp perturbation bound and standard large deviation results. △ Less

Submitted 5 March, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

arXiv:1609.06347 [pdf, other]

doi 10.1103/PhysRevE.97.022222

Stabilizing Embedology: Geometry-Preserving Delay-Coordinate Maps

Authors: Armin Eftekhari, Han Lun Yap, Michael B. Wakin, Christopher J. Rozell

Abstract: Delay-coordinate map** is an effective and widely used technique for reconstructing and analyzing the dynamics of a nonlinear system based on time-series outputs. The efficacy of delay-coordinate map** has long been supported by Takens' embedding theorem, which guarantees that delay-coordinate maps use the time-series output to provide a reconstruction of the hidden state space that is a one-t… ▽ More Delay-coordinate map** is an effective and widely used technique for reconstructing and analyzing the dynamics of a nonlinear system based on time-series outputs. The efficacy of delay-coordinate map** has long been supported by Takens' embedding theorem, which guarantees that delay-coordinate maps use the time-series output to provide a reconstruction of the hidden state space that is a one-to-one embedding of the system's attractor. While this topological guarantee ensures that distinct points in the reconstruction correspond to distinct points in the original state space, it does not characterize the quality of this embedding or illuminate how the specific parameters affect the reconstruction. In this paper, we extend Takens' result by establishing conditions under which delay-coordinate map** is guaranteed to provide a stable embedding of a system's attractor. Beyond only preserving the attractor topology, a stable embedding preserves the attractor geometry by ensuring that distances between points in the state space are approximately preserved. In particular, we find that delay-coordinate map** stably embeds an attractor of a dynamical system if the stable rank of the system is large enough to be proportional to the dimension of the attractor. The stable rank reflects the relation between the sampling interval and the number of delays in delay-coordinate map**. Our theoretical findings give guidance to choosing system parameters, echoing the trade-off between irrelevancy and redundancy that has been heuristically investigated in the literature. Our initial result is stated for attractors that are smooth submanifolds of Euclidean space, with extensions provided for the case of strange attractors. △ Less

Submitted 10 August, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

Journal ref: Phys. Rev. E 97, 022222 (2018)

arXiv:1609.01795 [pdf, other]

MC^2: A Two-Phase Algorithm for Leveraged Matrix Completion

Authors: Armin Eftekhari, Michael B. Wakin, Rachel A. Ward

Abstract: Leverage scores, loosely speaking, reflect the importance of the rows and columns of a matrix. Ideally, given the leverage scores of a rank-$r$ matrix $M\in\mathbb{R}^{n\times n}$, that matrix can be reliably completed from just $O(rn\log^{2}n)$ samples if the samples are chosen randomly from a nonuniform distribution induced by the leverage scores. In practice, however, the leverage scores are of… ▽ More Leverage scores, loosely speaking, reflect the importance of the rows and columns of a matrix. Ideally, given the leverage scores of a rank-$r$ matrix $M\in\mathbb{R}^{n\times n}$, that matrix can be reliably completed from just $O(rn\log^{2}n)$ samples if the samples are chosen randomly from a nonuniform distribution induced by the leverage scores. In practice, however, the leverage scores are often unknown a priori. As such, the sample complexity in uniform matrix completion---using uniform random sampling---increases to $O(η(M)\cdot rn\log^{2}n)$, where $η(M)$ is the largest leverage score of $M$. In this paper, we propose a two-phase algorithm called MC$^2$ for matrix completion: in the first phase, the leverage scores are estimated based on uniform random samples, and then in the second phase the matrix is resampled nonuniformly based on the estimated leverage scores and then completed. For well-conditioned matrices, the total sample complexity of MC$^2$ is no worse than uniform matrix completion, and for certain classes of well-conditioned matrices---namely, reasonably coherent matrices whose leverage scores exhibit mild decay---MC$^2$ requires substantially fewer samples. Numerical simulations suggest that the algorithm outperforms uniform matrix completion in a broad class of matrices, and in particular, is much less sensitive to the condition number than our theory currently requires. △ Less

Submitted 17 November, 2017; v1 submitted 6 September, 2016; originally announced September 2016.

arXiv:1606.01929 [pdf, other]

doi 10.1016/j.cma.2017.07.038

A near-stationary subspace for ridge approximation

Authors: Paul G. Constantine, Armin Eftekhari, Jeffrey Hokanson, Rachel Ward

Abstract: Response surfaces are common surrogates for expensive computer simulations in engineering analysis. However, the cost of fitting an accurate response surface increases exponentially as the number of model inputs increases, which leaves response surface construction intractable for high-dimensional, nonlinear models. We describe ridge approximation for fitting response surfaces in several variables… ▽ More Response surfaces are common surrogates for expensive computer simulations in engineering analysis. However, the cost of fitting an accurate response surface increases exponentially as the number of model inputs increases, which leaves response surface construction intractable for high-dimensional, nonlinear models. We describe ridge approximation for fitting response surfaces in several variables. A ridge function is constant along several directions in its domain, so fitting occurs on the coordinates of a low-dimensional subspace of the input space. We review essential theory for ridge approximation---e.g., the best mean-squared approximation and an optimal low-dimensional subspace---and we prove that the gradient-based active subspace is near-stationary for the least-squares problem that defines an optimal subspace. Motivated by the theory, we propose a computational heuristic that uses an estimated active subspace as an initial guess for a ridge approximation fitting problem. We show a simple example where the heuristic fails, which reveals a type of function for which the proposed approach is inappropriate. We then propose a simple alternating heuristic for fitting a ridge function, and we demonstrate the effectiveness of the active subspace initial guess applied to an airfoil model of drag as a function of its 18 shape parameters. △ Less

Submitted 7 June, 2017; v1 submitted 6 June, 2016; originally announced June 2016.

arXiv:1512.06906 [pdf, other]

What Happens to a Manifold Under a Bi-Lipschitz Map?

Authors: Armin Eftekhari, Michael B. Wakin

Abstract: We study geometric and topological properties of the image of a smooth submanifold of $\mathbb{R}^{n}$ under a bi-Lipschitz map to $\mathbb{R}^{m}$. In particular, we characterize how the dimension, diameter, volume, and reach of the embedded manifold relate to the original. Our main result establishes a lower bound on the reach of the embedded manifold in the case where $m \le n$ and the bi-Lipsc… ▽ More We study geometric and topological properties of the image of a smooth submanifold of $\mathbb{R}^{n}$ under a bi-Lipschitz map to $\mathbb{R}^{m}$. In particular, we characterize how the dimension, diameter, volume, and reach of the embedded manifold relate to the original. Our main result establishes a lower bound on the reach of the embedded manifold in the case where $m \le n$ and the bi-Lipschitz map is linear. We discuss implications of this work in signal processing and machine learning, where bi-Lipschitz maps on low-dimensional manifolds have been constructed using randomized linear operators. △ Less

Submitted 21 November, 2016; v1 submitted 21 December, 2015; originally announced December 2015.

arXiv:1511.03385 [pdf, other]

Greed is Super: A Fast Algorithm for Super-Resolution

Authors: Armin Eftekhari, Michael B. Wakin

Abstract: We present a fast two-phase algorithm for super-resolution with strong theoretical guarantees. Given the low-frequency part of the spectrum of a sequence of impulses, Phase I consists of a greedy algorithm that roughly estimates the impulse positions. These estimates are then refined by local optimization in Phase II. In contrast to the convex relaxation proposed by Candès et al., our approach h… ▽ More We present a fast two-phase algorithm for super-resolution with strong theoretical guarantees. Given the low-frequency part of the spectrum of a sequence of impulses, Phase I consists of a greedy algorithm that roughly estimates the impulse positions. These estimates are then refined by local optimization in Phase II. In contrast to the convex relaxation proposed by Candès et al., our approach has a low computational complexity but requires the impulses to be separated by an additional logarithmic factor to succeed. The backbone of our work is the fundamental work of Slepian et al. involving discrete prolate spheroidal wave functions and their unique properties. △ Less

Submitted 10 November, 2015; originally announced November 2015.

arXiv:1506.04190 [pdf, other]

Computing Active Subspaces Efficiently with Gradient Sketching

Authors: Paul G. Constantine, Armin Eftekhari, Michael B. Wakin

Abstract: Active subspaces are an emerging set of tools for identifying and exploiting the most important directions in the space of a computer simulation's input parameters; these directions depend on the simulation's quantity of interest, which we treat as a function from inputs to outputs. To identify a function's active subspace, one must compute the eigenpairs of a matrix derived from the function's gr… ▽ More Active subspaces are an emerging set of tools for identifying and exploiting the most important directions in the space of a computer simulation's input parameters; these directions depend on the simulation's quantity of interest, which we treat as a function from inputs to outputs. To identify a function's active subspace, one must compute the eigenpairs of a matrix derived from the function's gradient, which presents challenges when the gradient is not available as a subroutine. We numerically study two methods for estimating the necessary eigenpairs using only linear measurements of the function's gradient. In practice, these measurements can be estimated by finite differences using only two function evaluations, regardless of the dimension of the function's input space. △ Less

Submitted 11 October, 2015; v1 submitted 12 June, 2015; originally announced June 2015.

arXiv:1406.3831 [pdf, other]

A First Analysis of the Stability of Takens' Embedding

Authors: Han Lun Yap, Armin Eftekhari, Michael B. Wakin, Christopher J. Rozell

Abstract: Takens' Embedding Theorem asserts that when the states of a hidden dynamical system are confined to a low-dimensional attractor, complete information about the states can be preserved in the observed time-series output through the delay coordinate map. However, the conditions for the theorem to hold ignore the effects of noise and time-series analysis in practice requires a careful empirical deter… ▽ More Takens' Embedding Theorem asserts that when the states of a hidden dynamical system are confined to a low-dimensional attractor, complete information about the states can be preserved in the observed time-series output through the delay coordinate map. However, the conditions for the theorem to hold ignore the effects of noise and time-series analysis in practice requires a careful empirical determination of the sampling time and number of delays resulting in a number of delay coordinates larger than the minimum prescribed by Takens' theorem. In this paper, we use tools and ideas in Compressed Sensing to provide a first theoretical justification for the choice of the number of delays in noisy conditions. In particular, we show that under certain conditions on the dynamical system, measurement function, number of delays and sampling time, the delay-coordinate map can be a stable embedding of the dynamical system's attractor. △ Less

Submitted 15 June, 2014; originally announced June 2014.

arXiv:1306.4748 [pdf, other]

New Analysis of Manifold Embeddings and Signal Recovery from Compressive Measurements

Authors: Armin Eftekhari, Michael B. Wakin

Abstract: Compressive Sensing (CS) exploits the surprising fact that the information contained in a sparse signal can be preserved in a small number of compressive, often random linear measurements of that signal. Strong theoretical guarantees have been established concerning the embedding of a sparse signal family under a random measurement operator and on the accuracy to which sparse signals can be recove… ▽ More Compressive Sensing (CS) exploits the surprising fact that the information contained in a sparse signal can be preserved in a small number of compressive, often random linear measurements of that signal. Strong theoretical guarantees have been established concerning the embedding of a sparse signal family under a random measurement operator and on the accuracy to which sparse signals can be recovered from noisy compressive measurements. In this paper, we address similar questions in the context of a different modeling framework. Instead of sparse models, we focus on the broad class of manifold models, which can arise in both parametric and non-parametric signal families. Using tools from the theory of empirical processes, we improve upon previous results concerning the embedding of low-dimensional manifolds under random measurement operators. We also establish both deterministic and probabilistic instance-optimal bounds in $\ell_2$ for manifold-based signal recovery and parameter estimation from noisy compressive measurements. In line with analogous results for sparsity-based CS, we conclude that much stronger bounds are possible in the probabilistic setting. Our work supports the growing evidence that manifold-based models can be used with high accuracy in compressive signal processing. △ Less

Submitted 1 May, 2014; v1 submitted 19 June, 2013; originally announced June 2013.

Comments: arXiv admin note: substantial text overlap with arXiv:1002.1247

arXiv:1210.3395 [pdf, other]

The Restricted Isometry Property for Random Block Diagonal Matrices

Authors: Armin Eftekhari, Han Lun Yap, Christopher J. Rozell, Michael B. Wakin

Abstract: In Compressive Sensing, the Restricted Isometry Property (RIP) ensures that robust recovery of sparse vectors is possible from noisy, undersampled measurements via computationally tractable algorithms. It is by now well-known that Gaussian (or, more generally, sub-Gaussian) random matrices satisfy the RIP under certain conditions on the number of measurements. Their use can be limited in practice,… ▽ More In Compressive Sensing, the Restricted Isometry Property (RIP) ensures that robust recovery of sparse vectors is possible from noisy, undersampled measurements via computationally tractable algorithms. It is by now well-known that Gaussian (or, more generally, sub-Gaussian) random matrices satisfy the RIP under certain conditions on the number of measurements. Their use can be limited in practice, however, due to storage limitations, computational considerations, or the mismatch of such matrices with certain measurement architectures. These issues have recently motivated considerable effort towards studying the RIP for structured random matrices. In this paper, we study the RIP for block diagonal measurement matrices where each block on the main diagonal is itself a sub-Gaussian random matrix. Our main result states that such matrices can indeed satisfy the RIP but that the requisite number of measurements depends on certain properties of the basis in which the signals are sparse. In the best case, these matrices perform nearly as well as dense Gaussian random matrices, despite having many fewer nonzero entries. △ Less

Submitted 13 February, 2014; v1 submitted 11 October, 2012; originally announced October 2012.

MSC Class: 94A20; 60B20; 46B09;

arXiv:1101.2713 [pdf, other]

doi 10.1109/TIT.2013.2243495

Matched Filtering from Limited Frequency Samples

Authors: Armin Eftekhari, Justin Romberg, Michael B. Wakin

Abstract: In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this "compressive matched filter" as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical… ▽ More In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this "compressive matched filter" as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays. △ Less

Submitted 12 July, 2012; v1 submitted 13 January, 2011; originally announced January 2011.

Comments: Submitted to the IEEE Transactions on Information Theory on January 13, 2011

arXiv:physics/0408072 [pdf]

Inspecting plastic deformation of Pd by means of fractal geometry

Authors: Ali Eftekhari

Abstract: The influence of phase transformation-induced plastic deformation in Pd|H system on the electrode surface was investigated. Since the Pd surface is subject of severe plastic deformation during this process, the structure and roughness of the electrode surface significantly changes. Quantitative analysis of the electrode surfaces for comparative study of such changes is a valuable tool to inspect… ▽ More The influence of phase transformation-induced plastic deformation in Pd|H system on the electrode surface was investigated. Since the Pd surface is subject of severe plastic deformation during this process, the structure and roughness of the electrode surface significantly changes. Quantitative analysis of the electrode surfaces for comparative study of such changes is a valuable tool to inspect the plastic deformation induced. Fractal dimension can be used as a quantitative measure for this purpose. Since inappropriate methods may lead to significant errors, an appropriate approach was proposed for the determination of fractal dimensions in such systems. It was demonstrated that the surface roughness generated is mainly due to the plastic deformation induced, not the other side processes. The methodology is of general interest for the investigation of plastic deformations. The methodology is of general interest for the investigation of plastic deformations, and also other solid-state structural changes. △ Less

Submitted 15 August, 2004; originally announced August 2004.

Comments: 19 pages, 1 figure, 4 tables

arXiv:cs/0408041 [pdf]

Fractal geometry of literature: first attempt to Shakespeare's works

Authors: Ali Eftekhari

Abstract: It was demonstrated that there is a geometrical order in the structure of literature. Fractal geometry as a modern mathematical approach and a new geometrical viewpoint on natural objects including both processes and structures was employed for analysis of literature. As the first study, the works of William Shakespeare were chosen as the most important items in western literature. By counting t… ▽ More It was demonstrated that there is a geometrical order in the structure of literature. Fractal geometry as a modern mathematical approach and a new geometrical viewpoint on natural objects including both processes and structures was employed for analysis of literature. As the first study, the works of William Shakespeare were chosen as the most important items in western literature. By counting the number of letters applied in a manuscript, it is possible to study the whole manuscript statistically. A novel method based on basic assumption of fractal geometry was proposed for the calculation of fractal dimensions of the literature. The results were compared with Zipf's law. Zipf's law was successfully used for letters instead of words. Two new concepts namely Zipf's dimension and Zipf's order were also introduced. It was found that changes of both fractal dimension and Zipf's dimension are similar and dependent on the manuscript length. Interestingly, direct plotting the data obtained in semi-logarithmic and logarithmic forms also led to a power-law. △ Less

Submitted 17 August, 2004; originally announced August 2004.

Comments: 26 pages, 7 figures, 3 tables

arXiv:physics/0302027 [pdf]

Time asymmetry in ion diffusion under magnetic field

Authors: A. Eftekhari

Abstract: The general equation for the flux of an electrolyte in solution in the presence of an external magnetic field was derived mathematically in accordance with the Onsager formalism of irreversible thermodynamics. The time reversal symmetry was examined theoretically for the equation. Time reversal symmetry breaking was demonstrated for the case under investigation. Indeed, it was demonstrated theor… ▽ More The general equation for the flux of an electrolyte in solution in the presence of an external magnetic field was derived mathematically in accordance with the Onsager formalism of irreversible thermodynamics. The time reversal symmetry was examined theoretically for the equation. Time reversal symmetry breaking was demonstrated for the case under investigation. Indeed, it was demonstrated theoretically that diffusion of an electrolyte is solution under an applied magnetic field has asymmetry in respect with time. The important of the study is to show time asymmetry in a physical process based on mathematical derivation of available equations. △ Less

Submitted 10 February, 2003; originally announced February 2003.

Showing 1–39 of 39 results for author: Eftekhari, A