Search | arXiv e-print repository

doi 10.13140/RG.2.2.28438.01604

A Levenberg-Marquardt Method for Nonsmooth Regularized Least Squares

Authors: Aleksandr Y. Aravkin, Robert Baraldi, Dominique Orban

Abstract: We develop a Levenberg-Marquardt method for minimizing the sum of a smooth nonlinear least-squar es term $f(x) = \tfrac{1}{2} \|F(x)\|_2^2$ and a nonsmooth term $h$. Both $f$ and $h$ may be nonconvex. Steps are computed by minimizing the sum of a regularized linear least-squares model and a model of $h$ using a first-order method such as the proximal gradient method. We establish global convergenc… ▽ More We develop a Levenberg-Marquardt method for minimizing the sum of a smooth nonlinear least-squar es term $f(x) = \tfrac{1}{2} \|F(x)\|_2^2$ and a nonsmooth term $h$. Both $f$ and $h$ may be nonconvex. Steps are computed by minimizing the sum of a regularized linear least-squares model and a model of $h$ using a first-order method such as the proximal gradient method. We establish global convergence to a first-order stationary point of both a trust-region and a regularization variant of the Levenberg-Marquardt method under the assumptions that $F$ and its Jacobian are Lipschitz continuous and $h$ is proper and lower semi-continuous. In the worst case, both methods perform $O(ε^{-2})$ iterations to bring a measure of stationarity below $ε\in (0, 1)$. We report numerical results on three examples: a group-lasso basis-pursuit denoise example, a nonlinear support vector machine, and parameter estimation in neuron firing. For those examples to be implementable, we describe in detail how to evaluate proximal operators for separable $h$ and for the group lasso with trust-region constraint. In all cases, the Levenberg-Marquardt methods perform fewer outer iterations than a proximal-gradient method with adaptive step length and a quasi-Newton trust-region method, neither of which exploit the least-squares structure of the problem. Our results also highlight the need for more sophisticated subproblem solvers than simple first-order methods. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Report number: G-2022-58 MSC Class: 49J52; 65K10; 90C53; 90C56

arXiv:2105.00244 [pdf, ps, other]

doi 10.1109/LSP.2021.3120327

l1-Norm Minimization with Regula Falsi Type Root Finding Methods

Authors: Metin Vural, Aleksandr Y. Aravkin, Sławomir Stan'czak

Abstract: Sparse level-set formulations allow practitioners to find the minimum 1-norm solution subject to likelihood constraints. Prior art requires this constraint to be convex. In this letter, we develop an efficient approach for nonconvex likelihoods, using Regula Falsi root-finding techniques to solve the level-set formulation. Regula Falsi methods are simple, derivative-free, and efficient, and the ap… ▽ More Sparse level-set formulations allow practitioners to find the minimum 1-norm solution subject to likelihood constraints. Prior art requires this constraint to be convex. In this letter, we develop an efficient approach for nonconvex likelihoods, using Regula Falsi root-finding techniques to solve the level-set formulation. Regula Falsi methods are simple, derivative-free, and efficient, and the approach provably extends level-set methods to the broader class of nonconvex inverse problems. Practical performance is illustrated using l1-regularized Student's t inversion, which is a nonconvex approach used to develop outlier-robust formulations. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Comments: l1 -norm minimization, nonconvex models, Regula-Falsi, root-finding

MSC Class: 65K05; 49M37; 62-08; 65H04

arXiv:2103.15993 [pdf, other]

doi 10.13140/RG.2.2.18509.15845

A Proximal Quasi-Newton Trust-Region Method for Nonsmooth Regularized Optimization

Authors: Aleksandr Y. Aravkin, Robert Baraldi, Dominique Orban

Abstract: We develop a trust-region method for minimizing the sum of a smooth term $f$ and a nonsmooth term $h$), both of which can be nonconvex. Each iteration of our method minimizes a possibly nonconvex model of $f + h$ in a trust region. The model coincides with $f + h$ in value and subdifferential at the center. We establish global convergence to a first-order stationary point when $f$ satisfies a smoo… ▽ More We develop a trust-region method for minimizing the sum of a smooth term $f$ and a nonsmooth term $h$), both of which can be nonconvex. Each iteration of our method minimizes a possibly nonconvex model of $f + h$ in a trust region. The model coincides with $f + h$ in value and subdifferential at the center. We establish global convergence to a first-order stationary point when $f$ satisfies a smoothness condition that holds, in particular, when it has Lipschitz-continuous gradient, and $h$ is proper and lower semi-continuous. The model of $h$ is required to be proper, lower-semi-continuous and prox-bounded. Under these weak assumptions, we establish a worst-case $O(1/ε^2)$ iteration complexity bound that matches the best known complexity bound of standard trust-region methods for smooth optimization. We detail a special instance, named TR-PG, in which we use a limited-memory quasi-Newton model of $f$ and compute a step with the proximal gradient method, resulting in a practical proximal quasi-Newton method. We establish similar convergence properties and complexity bound for a quadratic regularization variant, named R2, and provide an interpretation as a proximal gradient method with adaptive step size for nonconvex problems. R2 may also be used to compute steps inside the trust-region method, resulting in an implementation named TR-R2. We describe our Julia implementations and report numerical results on inverse problems from sparse optimization and signal processing. Both TR-PG and TR-R2 exhibit promising performance and compare favorably with two linesearch proximal quasi-Newton methods based on convex models. △ Less

Submitted 2 August, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: 29 pages, 3 figures, 3 tables

Report number: G-2021-12 MSC Class: 49J52; 65K10; 90C53; 90C56

arXiv:2008.10740 [pdf, other]

Data-Driven Aerospace Engineering: Reframing the Industry with Machine Learning

Authors: Steven L. Brunton, J. Nathan Kutz, Krithika Manohar, Aleksandr Y. Aravkin, Kristi Morgansen, Jennifer Klemisch, Nicholas Goebel, James Buttrick, Jeffrey Poskin, Agnes Blom-Schieber, Thomas Hogan, Darren McDonald

Abstract: Data science, and machine learning in particular, is rapidly transforming the scientific and industrial landscapes. The aerospace industry is poised to capitalize on big data and machine learning, which excels at solving the types of multi-objective, constrained optimization problems that arise in aircraft design and manufacturing. Indeed, emerging methods in machine learning may be thought of as… ▽ More Data science, and machine learning in particular, is rapidly transforming the scientific and industrial landscapes. The aerospace industry is poised to capitalize on big data and machine learning, which excels at solving the types of multi-objective, constrained optimization problems that arise in aircraft design and manufacturing. Indeed, emerging methods in machine learning may be thought of as data-driven optimization techniques that are ideal for high-dimensional, non-convex, and constrained, multi-objective optimization problems, and that improve with increasing volumes of data. In this review, we will explore the opportunities and challenges of integrating data-driven science and engineering into the aerospace industry. Importantly, we will focus on the critical need for interpretable, generalizeable, explainable, and certifiable machine learning techniques for safety-critical applications. This review will include a retrospective, an assessment of the current state-of-the-art, and a roadmap looking forward. Recent algorithmic and technological trends will be explored in the context of critical challenges in aerospace design, manufacturing, verification, validation, and services. In addition, we will explore this landscape through several case studies in the aerospace industry. This document is the result of close collaboration between UW and Boeing to summarize past efforts and outline future opportunities. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: 35 pages, 16 figures

arXiv:1911.05182 [pdf, other]

A Proof of Principle: Multi-Modality Radiotherapy Optimization

Authors: Roman Levin, Aleksandr Y. Aravkin, Minsun Kim

Abstract: Radiotherapy is used to treat cancer patients by damaging DNA of tumor cells using ionizing radiation. Photons are the most widely used radiation type for therapy, having been put into use soon after the first discovery of X-rays in 1895. However, there are emerging interests and developments of other radiation modalities such as protons and carbon ions, owing to their unique biological and physic… ▽ More Radiotherapy is used to treat cancer patients by damaging DNA of tumor cells using ionizing radiation. Photons are the most widely used radiation type for therapy, having been put into use soon after the first discovery of X-rays in 1895. However, there are emerging interests and developments of other radiation modalities such as protons and carbon ions, owing to their unique biological and physical characteristics that distinguish these modalities from photons. Current attempts to determine an optimal radiation modality or an optimal combination of multiple modalities are empirical and in the early stage of development. In this paper, we propose a mathematical framework to optimize full radiation dose distributions and fractionation schedules of multiple radiation modalities, aiming to maximize the damage to the tumor while limiting the damage to the normal tissue to the corresponding tolerance level. This formulation gives rise to a non-convex, mixed integer program and we propose a bilevel optimization algorithm, to efficiently solve it. The upper level problem is to optimize the fractionation schedule using the dose distribution optimized in the lower level. We demonstrate the feasibility of our novel framework and algorithms in a simple 2-dimensional phantom with two different radiation modalities, where clinical intuition can be easily drawn. The results of our numerical simulations agree with the clinical intuition, validating our approach and showing the promise of the framework for further clinical investigation. △ Less

Submitted 18 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: 23 pages, 4 figures

arXiv:1911.00565 [pdf, other]

doi 10.1007/s00162-020-00529-9

Dimensionality Reduction and Reduced Order Modeling for Traveling Wave Physics

Authors: Ariana Mendible, Steven L. Brunton, Aleksandr Y. Aravkin, Wes Lowrie, J. Nathan Kutz

Abstract: We develop an unsupervised machine learning algorithm for the automated discovery and identification of traveling waves in spatio-temporal systems governed by partial differential equations (PDEs). Our method uses sparse regression and subspace clustering to robustly identify translational invariances that can be leveraged to build improved reduced order models (ROMs). Invariances, whether transla… ▽ More We develop an unsupervised machine learning algorithm for the automated discovery and identification of traveling waves in spatio-temporal systems governed by partial differential equations (PDEs). Our method uses sparse regression and subspace clustering to robustly identify translational invariances that can be leveraged to build improved reduced order models (ROMs). Invariances, whether translational or rotational, are well known to compromise the ability of ROMs to produce accurate and/or low-rank representations of the spatio-temporal dynamics. However, by discovering translations in a principled way, data can be shifted into a coordinate systems where quality, low-dimensional ROMs can be constructed. This approach can be used on either numerical or experimental data with or without knowledge of the governing equations. We demonstrate our method on a variety of PDEs of increasing difficulty, taken from the field of fluid dynamics, showing the efficacy and robustness of the proposed approach. △ Less

Submitted 18 May, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

Comments: 14 pages, 8 figures

arXiv:1910.13674 [pdf, other]

Efficient Robust Parameter Identification in Generalized Kalman Smoothing Models

Authors: Jonathan Jonker, Peng Zheng, Aleksandr Y. Aravkin

Abstract: Dynamic inference problems in autoregressive (AR/ARMA/ARIMA), exponential smoothing, and navigation are often formulated and solved using state-space models (SSM), which allow a range of statistical distributions to inform innovations and errors. In many applications the main goal is to identify not only the hidden state, but also additional unknown model parameters (e.g. AR coefficients or unknow… ▽ More Dynamic inference problems in autoregressive (AR/ARMA/ARIMA), exponential smoothing, and navigation are often formulated and solved using state-space models (SSM), which allow a range of statistical distributions to inform innovations and errors. In many applications the main goal is to identify not only the hidden state, but also additional unknown model parameters (e.g. AR coefficients or unknown dynamics). We show how to efficiently optimize over model parameters in SSM that use smooth process and measurement losses. Our approach is to project out state variables, obtaining a value function that only depends on the parameters of interest, and derive analytical formulas for first and second derivatives that can be used by many types of optimization methods. The approach can be used with smooth robust penalties such as Hybrid and the Student's t, in addition to classic least squares. We use the approach to estimate robust AR models and long-run unemployment rates with sudden changes. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: 7 pages, 3 figures

MSC Class: 90C30; 65K10; 65C60

arXiv:1910.07095 [pdf, other]

IRLS for Sparse Recovery Revisited: Examples of Failure and a Remedy

Authors: Aleksandr Y. Aravkin, James V. Burke, Daiwei He

Abstract: Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \em… ▽ More Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \emph{Iteratively reweighted least squares minimization for sparse recovery}, {\sf Communications on Pure and Applied Mathematics}, {\bf 63}(2010) 1--38. Under the null space property of order $K$, the authors show that their algorithm converges to the unique $k$-sparse solution for $k$ strictly bounded above by a value strictly less than $K$, and this $k$-sparse solution coincides with the unique $\ell_1$ solution. On the other hand, it is known that, for $k$ less than or equal to $K$, the $k$-sparse and $\ell_1$ solutions are unique and coincide. The authors emphasize that their proof method does not apply for $k$ sufficiently close to $K$, and remark that they were unsuccessful in finding an example where the algorithm fails for these values of $k$. In this note we construct a family of examples where the Daubechies-Devore-Fornasier-Güntürk IRLS algorithm fails for $k=K$, and provide a modification to their algorithm that provably converges to the unique $k$-sparse solution for $k$ less than or equal to $K$ while preserving the local linear rate. The paper includes numerical studies of this family as well as the modified IRLS algorithm, testing their robustness under perturbations and to parameter selection. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: 10 pages, 5 figures

MSC Class: 80M50; 60G35; 65C60

arXiv:1909.10700 [pdf, other]

Trimmed Constrained Mixed Effects Models: Formulations and Algorithms

Authors: Peng Zheng, Ryan Barber, Reed J. D. Sorensen, Christopher J. L. Murray, Aleksandr Y. Aravkin

Abstract: Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the assoc… ▽ More Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the associated marginal likelihood. The software accompanying this paper is disseminated as an open-source Python package called LimeTr. LimeTr is able to recover results more accurately in the presence of outliers compared to available packages for both standard longitudinal analysis and meta-analysis, and is also more computationally efficient than competing robust alternatives. Supplementary materials that reproduce the simulations, as well as run LimeTr and third party code are available online. We also present analyses of global health data, where we use advanced functionality of LimeTr, including constraints to impose monotonicity and concavity for dose-response relationships. Nonlinear observation models allow new analyses in place of classic approximations, such as log-linear models. Robust extensions in all analyses ensure that spurious data points do not drive our understanding of either mean relationships or between-study heterogeneity. △ Less

Submitted 27 October, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: 33 pages, 7 figures

MSC Class: 62J02; 62F30; 65K05; 49M37

arXiv:1807.05411 [pdf, other]

A Unified Framework for Sparse Relaxed Regularized Regression: SR3

Authors: Peng Zheng, Travis Askham, Steven L. Brunton, J. Nathan Kutz, Aleksandr Y. Aravkin

Abstract: Regularized regression problems are ubiquitous in statistical modeling, signal processing, and machine learning. Sparse regression in particular has been instrumental in scientific model discovery, including compressed sensing applications, variable selection, and high-dimensional analysis. We propose a broad framework for sparse relaxed regularized regression, called SR3. The key idea is to solve… ▽ More Regularized regression problems are ubiquitous in statistical modeling, signal processing, and machine learning. Sparse regression in particular has been instrumental in scientific model discovery, including compressed sensing applications, variable selection, and high-dimensional analysis. We propose a broad framework for sparse relaxed regularized regression, called SR3. The key idea is to solve a relaxation of the regularized problem, which has three advantages over the state-of-the-art: (1) solutions of the relaxed problem are superior with respect to errors, false positives, and conditioning, (2) relaxation allows extremely fast algorithms for both convex and nonconvex formulations, and (3) the methods apply to composite regularizers such as total variation (TV) and its nonconvex variants. We demonstrate the advantages of SR3 (computational efficiency, higher accuracy, faster convergence rates, greater flexibility) across a range of regularized regression problems with synthetic and real data, including applications in compressed sensing, LASSO, matrix completion, TV regularization, and group sparsity. To promote reproducible research, we also provide a companion MATLAB package that implements these examples. △ Less

Submitted 8 November, 2018; v1 submitted 14 July, 2018; originally announced July 2018.

Comments: 19 pages, 14 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:1807.03091 [pdf, other]

Computer Assisted Localization of a Heart Arrhythmia

Authors: Chris Vogl, Peng Zheng, Stephen P. Seslar, Aleksandr Y. Aravkin

Abstract: We consider the problem of locating a point-source heart arrhythmia using data from a standard diagnostic procedure, where a reference catheter is placed in the heart, and arrival times from a second diagnostic catheter are recorded as the diagnostic catheter moves around within the heart. We model this situation as a nonconvex feasibility problem, where given a set of arrival times, we look for a… ▽ More We consider the problem of locating a point-source heart arrhythmia using data from a standard diagnostic procedure, where a reference catheter is placed in the heart, and arrival times from a second diagnostic catheter are recorded as the diagnostic catheter moves around within the heart. We model this situation as a nonconvex feasibility problem, where given a set of arrival times, we look for a source location that is consistent with the available data. We develop a new optimization approach and fast algorithm to obtain online proposals for the next location to suggest to the operator as she collects data. We validate the procedure using a Monte Carlo simulation based on patients' electrophysiological data. The proposed procedure robustly and quickly locates the source of arrhythmias without any prior knowledge of heart anatomy. △ Less

Submitted 9 July, 2018; originally announced July 2018.

Comments: 4 pages, 5 figures

MSC Class: 92C50; 92-08; 65R32; 90C30

arXiv:1803.06460 [pdf, other]

Mean Reverting Portfolios via Penalized OU-Likelihood Estimation

Authors: Jize Zhang, Tim Leung, Aleksandr Y. Aravkin

Abstract: We study an optimization-based approach to con- struct a mean-reverting portfolio of assets. Our objectives are threefold: (1) design a portfolio that is well-represented by an Ornstein-Uhlenbeck process with parameters estimated by maximum likelihood, (2) select portfolios with desirable characteristics of high mean reversion and low variance, and (3) select a parsimonious portfolio, i.e. find a… ▽ More We study an optimization-based approach to con- struct a mean-reverting portfolio of assets. Our objectives are threefold: (1) design a portfolio that is well-represented by an Ornstein-Uhlenbeck process with parameters estimated by maximum likelihood, (2) select portfolios with desirable characteristics of high mean reversion and low variance, and (3) select a parsimonious portfolio, i.e. find a small subset of a larger universe of assets that can be used for long and short positions. We present the full problem formulation, a specialized algorithm that exploits partial minimization, and numerical examples using both simulated and empirical price data. △ Less

Submitted 17 March, 2018; originally announced March 2018.

Comments: 7 pages, 6 figures

MSC Class: 91G60; 90C30; 65K10

arXiv:1803.02525 [pdf, other]

Fast Robust Methods for Singular State-Space Models

Authors: Jonathan Jonker, Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto, Sarah Webster

Abstract: State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we develo… ▽ More State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we develop methods on state-space models where either innovations or error covariances may be singular. These models frequently arise in navigation (e.g. for `colored noise' models or deterministic integrals) and are ubiquitous in auto-correlated time series models such as ARMA. We reformulate all state-space models (singular as well as nonsinguar) as constrained convex optimization problems, and develop an efficient algorithm for this reformulation. The convergence rate is {\it locally linear}, with constants that do not depend on the conditioning of the problem. Numerical comparisons show that the new approach outperforms competing approaches for {\it nonsingular} models, including state of the art interior point (IP) methods. IP methods converge at superlinear rates; we expect them to dominate. However, the steep rate of the proposed approach (independent of problem conditioning) combined with cheap iterations wins against IP in a run-time comparison. We therefore suggest that the proposed approach be the {\it default choice} for estimating state space models outside of the Gaussian context, regardless of whether the error covariances are singular or not. △ Less

Submitted 28 June, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: 11 pages, 4 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:1702.08649 [pdf, other]

Foundations of gauge and perspective duality

Authors: Alexandre Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Kellie MacPhee

Abstract: We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allo… ▽ More We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression. △ Less

Submitted 18 June, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: 29 pages

arXiv:1609.06369 [pdf, ps, other]

Generalized Kalman Smoothing: Modeling and Algorithms

Authors: A. Y. Aravkin, J. V. Burke, L. Ljung, A. Lozano, G. Pillonetto

Abstract: State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. T… ▽ More State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples. △ Less

Submitted 25 September, 2016; v1 submitted 20 September, 2016; originally announced September 2016.

Comments: 29 pages, 11 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:1608.06159 [pdf, other]

Total-variation regularization strategies in full-waveform inversion

Authors: Ernie Esser, Lluis Guasch, Tristan van Leeuwen, Aleksandr Y. Aravkin, Felix J. Herrmann

Abstract: We propose an extended full-waveform inversion formulation that includes general convex constraints on the model. Though the full problem is highly nonconvex, the overarching optimization scheme arrives at geologically plausible results by solving a sequence of relaxed and warm-started constrained convex subproblems. The combination of box, total-variation, and successively relaxed asymmetric tota… ▽ More We propose an extended full-waveform inversion formulation that includes general convex constraints on the model. Though the full problem is highly nonconvex, the overarching optimization scheme arrives at geologically plausible results by solving a sequence of relaxed and warm-started constrained convex subproblems. The combination of box, total-variation, and successively relaxed asymmetric total-variation constraints allows us to steer free from parasitic local minima while kee** the estimated physical parameters laterally continuous and in a physically realistic range. For accurate starting models, numerical experiments carried out on the challenging 2004 BP velocity benchmark demonstrate that bound and total-variation constraints improve the inversion result significantly by removing inversion artifacts, related to source encoding, and by clearly improved delineation of top, bottom, and flanks of a high-velocity high-contrast salt inclusion. The experiments also show that for poor starting models these two constraints by themselves are insufficient to detect the bottom of high-velocity inclusions such as salt. Inclusion of the one-sided asymmetric total-variation constraint overcomes this issue by discouraging velocity lows to buildup during the early stages of the inversion. To the author's knowledge the presented algorithm is the first to successfully remove the imprint of local minima caused by poor starting models and band-width limited finite aperture data. △ Less

Submitted 22 August, 2016; originally announced August 2016.

Comments: 25 pages, 15 figures

MSC Class: 65K05; 65K10; 86-08

arXiv:1607.02624 [pdf, other]

Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach

Authors: Rajiv Kumar, Oscar López, Damek Davis, Aleksandr Y. Aravkin, Felix J. Herrmann

Abstract: Acquisition cost is a crucial bottleneck for seismic workflows, and low-rank formulations for data interpolation allow practitioners to `fill in' data volumes from critically subsampled data acquired in the field. Tremendous size of seismic data volumes required for seismic processing remains a major challenge for these techniques. We propose a new approach to solve residual constrained formulat… ▽ More Acquisition cost is a crucial bottleneck for seismic workflows, and low-rank formulations for data interpolation allow practitioners to `fill in' data volumes from critically subsampled data acquired in the field. Tremendous size of seismic data volumes required for seismic processing remains a major challenge for these techniques. We propose a new approach to solve residual constrained formulations for interpolation. We represent the data volume using matrix factors, and build a block-coordinate algorithm with constrained convex subproblems that are solved with a primal-dual splitting scheme. The new approach is competitive with state of the art level-set algorithms that interchange the role of objectives with constraints. We use the new algorithm to successfully interpolate a large scale 5D seismic data volume, generated from the geologically complex synthetic 3D Compass velocity model, where 80% of the data has been removed. △ Less

Submitted 9 July, 2016; originally announced July 2016.

Comments: 16 pages, 7 figures

MSC Class: 62F35; 65K10

arXiv:1606.02395 [pdf, ps, other]

Efficient quadratic penalization through the partial minimization technique

Authors: Aleksandr Y. Aravkin, Dmitriy Drusvyatskiy, Tristan van Leeuwen

Abstract: Common computational problems, such as parameter estimation in dynamic models and PDE constrained optimization, require data fitting over a set of auxiliary parameters subject to physical constraints over an underlying state. Naive quadratically penalized formulations, commonly used in practice, suffer from inherent ill-conditioning. We show that surprisingly the partial minimization technique reg… ▽ More Common computational problems, such as parameter estimation in dynamic models and PDE constrained optimization, require data fitting over a set of auxiliary parameters subject to physical constraints over an underlying state. Naive quadratically penalized formulations, commonly used in practice, suffer from inherent ill-conditioning. We show that surprisingly the partial minimization technique regularizes the problem, making it well-conditioned. This viewpoint sheds new light on variable projection techniques, as well as the penalty method for PDE constrained optimization, and motivates robust extensions. In addition, we outline an inexact analysis, showing that the partial minimization subproblem can be solved very loosely in each iteration. We illustrate the theory and algorithms on boundary control, optimal transport, and parameter estimation for robust dynamic inference. △ Less

Submitted 17 September, 2017; v1 submitted 8 June, 2016; originally announced June 2016.

Comments: 8 pages, 9 figures

MSC Class: 65K05; 65K10; 86-08

arXiv:1604.06194 [pdf, ps, other]

Dynamic matrix factorization with social influence

Authors: Aleksandr Y. Aravkin, Kush R. Varshney, Liu Yang

Abstract: Matrix factorization is a key component of collaborative filtering-based recommendation systems because it allows us to complete sparse user-by-item ratings matrices under a low-rank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for unde… ▽ More Matrix factorization is a key component of collaborative filtering-based recommendation systems because it allows us to complete sparse user-by-item ratings matrices under a low-rank assumption that encodes the belief that similar users give similar ratings and that similar items garner similar ratings. This paradigm has had immeasurable practical success, but it is not the complete story for understanding and inferring the preferences of people. First, peoples' preferences and their observable manifestations as ratings evolve over time along general patterns of trajectories. Second, an individual person's preferences evolve over time through influence of their social connections. In this paper, we develop a unified process model for both types of dynamics within a state space approach, together with an efficient optimization scheme for estimation within that model. The model combines elements from recent developments in dynamic matrix factorization, opinion dynamics and social learning, and trust-based recommendation. The estimation builds upon recent advances in numerical nonlinear optimization. Empirical results on a large-scale data set from the Epinions website demonstrate consistent reduction in root mean squared error by consideration of the two types of dynamics. △ Less

Submitted 21 April, 2016; originally announced April 2016.

Comments: 6 pages, 5 figures

MSC Class: 90C06; 81P50; 65K10; 62F35; 47N30

arXiv:1603.00284 [pdf, other]

Dual Smoothing and Level Set Techniques for Variational Matrix Decomposition

Authors: Aleksandr Y. Aravkin, Stephen Becker

Abstract: We focus on the robust principal component analysis (RPCA) problem, and review a range of old and new convex formulations for the problem and its variants. We then review dual smoothing and level set techniques in convex optimization, present several novel theoretical results, and apply the techniques on the RPCA problem. In the final sections, we show a range of numerical experiments for simulate… ▽ More We focus on the robust principal component analysis (RPCA) problem, and review a range of old and new convex formulations for the problem and its variants. We then review dual smoothing and level set techniques in convex optimization, present several novel theoretical results, and apply the techniques on the RPCA problem. In the final sections, we show a range of numerical experiments for simulated and real-world problems. △ Less

Submitted 1 March, 2016; originally announced March 2016.

Comments: 38 pages, 10 figures. arXiv admin note: text overlap with arXiv:1406.1089

MSC Class: 90C06; 81P50; 65K10; 62F35; 47N30

arXiv:1602.01506 [pdf, other]

Level-set methods for convex optimization

Authors: Aleksandr Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Scott Roy

Abstract: Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based o… ▽ More Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference. △ Less

Submitted 3 February, 2016; originally announced February 2016.

Comments: 38 pages

arXiv:1403.6706 [pdf, other]

Beyond L2-Loss Functions for Learning Sparse Models

Authors: Karthikeyan Natesan Ramamurthy, Aleksandr Y. Aravkin, Jayaraman J. Thiagarajan

Abstract: Incorporating sparsity priors in learning tasks can give rise to simple, and interpretable models for complex high dimensional data. Sparse models have found widespread use in structure discovery, recovering data from corruptions, and a variety of large scale unsupervised and supervised learning problems. Assuming the availability of sufficient data, these methods infer dictionaries for sparse rep… ▽ More Incorporating sparsity priors in learning tasks can give rise to simple, and interpretable models for complex high dimensional data. Sparse models have found widespread use in structure discovery, recovering data from corruptions, and a variety of large scale unsupervised and supervised learning problems. Assuming the availability of sufficient data, these methods infer dictionaries for sparse representations by optimizing for high-fidelity reconstruction. In most scenarios, the reconstruction quality is measured using the squared Euclidean distance, and efficient algorithms have been developed for both batch and online learning cases. However, new application domains motivate looking beyond conventional loss functions. For example, robust loss functions such as $\ell_1$ and Huber are useful in learning outlier-resilient models, and the quantile loss is beneficial in discovering structures that are the representative of a particular quantile. These new applications motivate our work in generalizing sparse learning to a broad class of convex loss functions. In particular, we consider the class of piecewise linear quadratic (PLQ) cost functions that includes Huber, as well as $\ell_1$, quantile, Vapnik, hinge loss, and smoothed variants of these penalties. We propose an algorithm to learn dictionaries and obtain sparse codes when the data reconstruction fidelity is measured using any smooth PLQ cost function. We provide convergence guarantees for the proposed algorithm, and demonstrate the convergence behavior using empirical experiments. Furthermore, we present three case studies that require the use of PLQ cost functions: (i) robust image modeling, (ii) tag refinement for image annotation and retrieval and (iii) computing empirical confidence limits for subspace clustering. △ Less

Submitted 26 March, 2014; originally announced March 2014.

Comments: 10 pages, 6 figures

ACM Class: I.2.6; G.1.6

arXiv:1402.4624 [pdf, ps, other]

Sparse Quantile Huber Regression for Efficient and Robust Estimation

Authors: Aleksandr Y. Aravkin, Anju Kambadur, Aurelie C. Lozano, Ronny Luss

Abstract: We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the pre… ▽ More We consider new formulations and methods for sparse quantile regression in the high-dimensional setting. Quantile regression plays an important role in many applications, including outlier-robust exploratory analysis in gene selection. In addition, the sparsity consideration in quantile regression enables the exploration of the entire conditional distribution of the response variable given the predictors and therefore yields a more comprehensive view of the important predictors. We propose a generalized OMP algorithm for variable selection, taking the misfit loss to be either the traditional quantile loss or a smooth version we call quantile Huber, and compare the resulting greedy approaches with convex sparsity-regularized formulations. We apply a recently proposed interior point methodology to efficiently solve all convex formulations as well as convex subproblems in the generalized OMP setting, pro- vide theoretical guarantees of consistent estimation, and demonstrate the performance of our approach using empirical studies of simulated and genomic datasets. △ Less

Submitted 19 February, 2014; originally announced February 2014.

Comments: 9 pages

MSC Class: 62F35; 65K10

arXiv:1309.7857 [pdf, other]

Generalized system identification with stable spline kernels

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stab… ▽ More Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stable spline estimators, where regularization functionals and data misfits can be selected from a rich set of piecewise linear-quadratic (PLQ) penalties. This class includes the 1-norm, Huber, and Vapnik, in addition to the least-squares penalty. By representing penalties through their conjugates, the modeler can specify any piecewise linear-quadratic penalty for misfit and regularizer, as well as inequality constraints on the response. The interior-point solver we implement (IPsolve) is locally quadratically convergent, with $O(\min(m,n)^2(m+n))$ arithmetic operations per iteration, where $n$ the number of unknown impulse response coefficients and $m$ the number of observed output measurements. IPsolve is competitive with available alternatives for system identification. This is shown by a comparison with TFOCS, libSVM, and the FISTA algorithm. The code is open source (https://github.com/saravkin/IPsolve). The impact of the approach for system identification is illustrated with numerical experiments featuring robust formulations for contaminated data, relaxation systems, nonnegativity and unimodality constraints on the impulse response, and sparsity promoting regularization. Incorporating constraints yields particularly significant improvements. △ Less

Submitted 25 July, 2018; v1 submitted 30 September, 2013; originally announced September 2013.

Comments: 23 pages, 6 figures

MSC Class: 62F35; 65K10

arXiv:1309.1508 [pdf, other]

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

Authors: Tara N. Sainath, Lior Horesh, Brian Kingsbury, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

Abstract: Hessian-free training has become a popular parallel second or- der optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-… ▽ More Hessian-free training has become a popular parallel second or- der optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-BFGS based preconditioning scheme that avoids the need to access the Hessian explicitly. Since L-BFGS cannot be regarded as a fixed-point iteration, we further propose the employment of flexible Krylov subspace solvers that retain the desired theoretical convergence guarantees of their conventional counterparts. Second, we propose a new sampling algorithm, which geometrically increases the amount of data utilized for gradient and Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, we find that these methodologies provide roughly a 1.5x speed-up, whereas, on a 300-hr Switchboard task, these techniques provide over a 2.3x speedup, with no loss in WER. These results suggest that even further speed-up is expected, as problems scale and complexity grows. △ Less

Submitted 10 December, 2013; v1 submitted 5 September, 2013; originally announced September 2013.

Comments: this paper is not supposed to be posted publically before the conference in December due to company policy. another co-author was not informed of this and posted without the permission of the first author. pls remove

MSC Class: 65K05; 90C15; 90C90

arXiv:1309.1501 [pdf, ps, other]

Improvements to deep convolutional neural networks for LVCSR

Authors: Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

Abstract: Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further imp… ▽ More Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further improve CNN performance. First, we conduct a deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features. Second, we apply various pooling strategies that have shown improvements in computer vision to an LVCSR speech task. Third, we introduce a method to effectively incorporate speaker adaptation, namely fMLLR, into log-mel features. Fourth, we introduce an effective strategy to use dropout during Hessian-free sequence training. We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5% relative improvement over our previous best CNN baseline. △ Less

Submitted 10 December, 2013; v1 submitted 5 September, 2013; originally announced September 2013.

Comments: 6 pages, 1 figure

MSC Class: 65K05; 90C15; 90C90

arXiv:1309.1369 [pdf, other]

Semistochastic Quadratic Bound Methods

Authors: Aleksandr Y. Aravkin, Anna Choromanska, Tony Jebara, Dimitri Kanevsky

Abstract: Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorab… ▽ More Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets. △ Less

Submitted 17 February, 2014; v1 submitted 5 September, 2013; originally announced September 2013.

Comments: 11 pages, 1 figure

MSC Class: 90C55; 90C15; 62H30

arXiv:1306.1052 [pdf, other]

Fast Dual Variational Inference for Non-Conjugate LGMs

Authors: Mohammad Emtiyaz Khan, Aleksandr Y. Aravkin, Michael P. Friedlander, Matthias Seeger

Abstract: Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. Ho… ▽ More Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using real-world data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields. △ Less

Submitted 5 June, 2013; originally announced June 2013.

Comments: 9 pages, 3 figures

MSC Class: 62F15; 65K10; 49M29; 90C06

arXiv:1303.5588 [pdf, other]

Robust and Trend Following Student's t Kalman Smoothers

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smooth… ▽ More We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features. △ Less

Submitted 22 March, 2013; originally announced March 2013.

Comments: 23 pages, 7 figures

MSC Class: 62F35; 65K10

arXiv:1303.5237 [pdf, ps, other]

Kalman smoothing and block tridiagonal systems: new connections and numerical stability results

Authors: Aleksandr Y. Aravkin, Bradley B. Bell, James V. Burke, Gianluigi Pillonetto

Abstract: The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimiz… ▽ More The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimization and algebraic perspective, revealing new insights on their numerical stability properties. In doing this, we re-interpret classic recursions as matrix decomposition methods for block tridiagonal matrices. First, we show that the classic RTS smoother is an implementation of the forward block tridiagonal (FBT) algorithm (also known as Thomas algorithm) for particular block tridiagonal systems. We study the numerical stability properties of this scheme, connecting the condition number of the full system to properties of the individual blocks encountered during standard recursion. Second, we study the M smoother, and prove it is equivalent to a backward block tridiagonal (BBT) algorithm with a stronger stability guarantee than RTS. Third, we illustrate how the MF smoother solves a block tridiagonal system, and prove that it has the same numerical stability properties of RTS (but not those of M). Finally, we present a new hybrid RTS/M (FBT/BBT) smoothing scheme, which is faster than MF, and has the same numerical stability guarantees of RTS and MF. △ Less

Submitted 24 July, 2013; v1 submitted 21 March, 2013; originally announced March 2013.

Comments: 11 pages, no figures

MSC Class: 65F05; 65F50; 49M15

arXiv:1303.2827 [pdf, other]

Linear system identification using stable spline kernels and PLQ penalties

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized l… ▽ More The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized least squares framework. In particular, the penalty term on the impulse response is defined by so called stable spline kernels. They embed information on regularity and BIBO stability, and depend on a small number of parameters which can be estimated from data. In this paper, we provide new nonsmooth formulations of the stable spline estimator. In particular, we consider linear system identification problems in a very broad context, where regularization functionals and data misfits can come from a rich set of piecewise linear quadratic functions. Moreover, our anal- ysis includes polyhedral inequality constraints on the unknown impulse response. For any formulation in this class, we show that interior point methods can be used to solve the system identification problem, with complexity O(n3)+O(mn2) in each iteration, where n and m are the number of impulse response coefficients and measurements, respectively. The usefulness of the framework is illustrated via a numerical experiment where output measurements are contaminated by outliers. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Comments: 8 pages, 2 figures

MSC Class: 47N30; 65K10

arXiv:1303.1993 [pdf, other]

Optimization viewpoint on Kalman smoothing, with applications to robust and sparse estimation

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this… ▽ More In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this problem. Once this equivalence is established, we present extensions of Kalman smoothing to systems with nonlinear process and measurement models, systems with linear and nonlinear inequality constraints, systems with outliers in the measurements or sudden changes in the state, and systems where the sparsity of the state sequence must be accounted for. All extensions preserve the computational efficiency of the classic algorithms, and most of the extensions are illustrated with numerical examples, which are part of an open source Kalman smoothing Matlab/Octave package. △ Less

Submitted 11 March, 2013; v1 submitted 8 March, 2013; originally announced March 2013.

Comments: 46 pages, 11 figures

MSC Class: 62F35; 65K10;

arXiv:1302.6434 [pdf, other]

Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso

Authors: Aleksandr Y. Aravkin, James V. Burke, Alessandro Chiuso, Gianluigi Pillonetto

Abstract: The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational… ▽ More The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches. △ Less

Submitted 26 February, 2013; v1 submitted 26 February, 2013; originally announced February 2013.

Comments: 50 pages, 12 figures

MSC Class: 62F35; 65K10; 47N30

arXiv:1301.5288 [pdf, other]

The connection between Bayesian estimation of a Gaussian random field and RKHS

Authors: Aleksandr Y. Aravkin, Bradley M. Bell, James V. Burke, Gianluigi Pillonetto

Abstract: Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate re… ▽ More Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this paper, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (including where the data were collected), the MAP estimate for the signal samples is given by the RKHS estimate evaluated at these locations. △ Less

Submitted 17 July, 2013; v1 submitted 22 January, 2013; originally announced January 2013.

Comments: 8 pages, 2 figures

MSC Class: 47N30; 65K10

arXiv:1301.4566 [pdf, other]

Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a… ▽ More We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases. △ Less

Submitted 2 May, 2013; v1 submitted 19 January, 2013; originally announced January 2013.

Comments: 41 pages, 4 figures

MSC Class: 62F35; 65K10

arXiv:1212.0912 [pdf, other]

Sparse seismic imaging using variable projection

Authors: Aleksandr Y. Aravkin, Tristan van Leeuwen, Ning Tu

Abstract: We consider an important class of signal processing problems where the signal of interest is known to be sparse, and can be recovered from data given auxiliary information about how the data was generated. For example, a sparse Green's function may be recovered from seismic experimental data using sparsity optimization when the source signature is known. Unfortunately, in practice this information… ▽ More We consider an important class of signal processing problems where the signal of interest is known to be sparse, and can be recovered from data given auxiliary information about how the data was generated. For example, a sparse Green's function may be recovered from seismic experimental data using sparsity optimization when the source signature is known. Unfortunately, in practice this information is often missing, and must be recovered from data along with the signal using deconvolution techniques. In this paper, we present a novel methodology to simultaneously solve for the sparse signal and auxiliary parameters using a recently proposed variable projection technique. Our main contribution is to combine variable projection with sparsity promoting optimization, obtaining an efficient algorithm for large-scale sparse deconvolution problems. We demonstrate the algorithm on a seismic imaging example. △ Less

Submitted 4 December, 2012; originally announced December 2012.

Comments: 5 pages, 4 figures

MSC Class: 65K05; 65K10; 86-08

arXiv:1211.4601 [pdf, other]

Smoothing Dynamic Systems with State-Dependent Covariance Matrices

Authors: Aleksandr Y. Aravkin, James V. Burke

Abstract: Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally o… ▽ More Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally on unknown parameters. In the Kalman framework, this allows modeling situations where covariance matrices may depend functionally on the state sequence being estimated. We present an extended formulation and generalized Gauss-Newton (GGN) algorithm for inference in this context. When applied to dynamic systems inference, we show the algorithm can be implemented to preserve the computational efficiency of the classic Kalman smoother. The new approach is illustrated with a synthetic numerical example. △ Less

Submitted 20 March, 2014; v1 submitted 19 November, 2012; originally announced November 2012.

Comments: 8 pages, 1 figure

MSC Class: 62F35; 65K10

arXiv:1211.3724 [pdf, other]

doi 10.1137/120899157

Variational properties of value functions

Authors: Aleksandr Y. Aravkin, James V. Burke, Michael P. Friedlander

Abstract: Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. Th… ▽ More Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results. △ Less

Submitted 23 May, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

Comments: 30 pages

Journal ref: SIAM Journal on Optimization, 23(3):1689-1717, 2013

arXiv:1206.6532 [pdf, other]

doi 10.1088/0266-5611/28/11/115016

Estimating Nuisance Parameters in Inverse Problems

Authors: Aleksandr Y. Aravkin, Tristan van Leeuwen

Abstract: Many inverse problems include nuisance parameters which, while not of direct interest, are required to recover primary parameters. Structure present in these problems allows efficient optimization strategies - a well known example is variable projection, where nonlinear least squares problems which are linear in some parameters can be very efficiently optimized. In this paper, we extend the idea o… ▽ More Many inverse problems include nuisance parameters which, while not of direct interest, are required to recover primary parameters. Structure present in these problems allows efficient optimization strategies - a well known example is variable projection, where nonlinear least squares problems which are linear in some parameters can be very efficiently optimized. In this paper, we extend the idea of projecting out a subset over the variables to a broad class of maximum likelihood (ML) and maximum a posteriori likelihood (MAP) problems with nuisance parameters, such as variance or degrees of freedom. As a result, we are able to incorporate nuisance parameter estimation into large-scale constrained and unconstrained inverse problem formulations. We apply the approach to a variety of problems, including estimation of unknown variance parameters in the Gaussian model, degree of freedom (d.o.f.) parameter estimation in the context of robust inverse problems, automatic calibration, and optimal experimental design. Using numerical examples, we demonstrate improvement in recovery of primary parameters for several large- scale inverse problems. The proposed approach is compatible with a wide variety of algorithms and formulations, and its implementation requires only minor modifications to existing algorithms. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: 16 pages, 5 figures

MSC Class: 65K05; 65K10; 86-08

arXiv:1111.2730 [pdf, other]

A statistical and computational theory for robust and sparse Kalman smoothing

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the… ▽ More Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the data, and smoothers that keep better track of fast system dynamics, e.g. jumps in the state values. In addition to L2, well known examples of PLQ penalties include the L1, Huber and Vapnik losses. In this paper, we use a dual representation for PLQ penalties to build a statistical modeling framework and a computational theory for Kalman smoothing. We develop a statistical framework by establishing conditions required to interpret PLQ penalties as negative logs of true probability densities. Then, we present a computational framework, based on interior-point methods, that solves the Kalman smoothing problem with PLQ penalties and maintains the linear complexity in the size of the time series, just as in the L2 case. The framework presented extends the computational efficiency of the Mayne-Fraser and Rauch-Tung-Striebel algorithms to a much broader non-smooth setting, and includes many known robust and sparse smoothers as special cases. △ Less

Submitted 11 November, 2011; originally announced November 2011.

Comments: 8 pages

MSC Class: 62F35; 65K10

arXiv:1111.1400 [pdf, other]

Student's T Robust Bundle Adjustment Algorithm

Authors: Aleksandr Y. Aravkin, Michael Styer, Zachary Moratto, Ara Nefian, Michael Broxton

Abstract: Bundle adjustment (BA) is the problem of refining a visual reconstruction to produce better structure and viewing parameter estimates. This problem is often formulated as a nonlinear least squares problem, where data arises from interest point matching. Mismatched interest points cause serious problems in this approach, as a single mismatch will affect the entire reconstruction. In this paper, we… ▽ More Bundle adjustment (BA) is the problem of refining a visual reconstruction to produce better structure and viewing parameter estimates. This problem is often formulated as a nonlinear least squares problem, where data arises from interest point matching. Mismatched interest points cause serious problems in this approach, as a single mismatch will affect the entire reconstruction. In this paper, we propose a novel robust Student's t BA algorithm (RST-BA). We model reprojection errors using the heavy tailed Student's t-distribution, and use an implicit trust region method to compute the maximum a posteriori (MAP) estimate of the camera and viewing parameters in this model. The resulting algorithm exploits the sparse structure essential for reconstructing multi-image scenarios, has the same time complexity as standard L2 bundle adjustment (L2-BA), and can be implemented with minimal changes to the standard least squares framework. We show that the RST-BA is more accurate than either L2-BA or L2-BA with a sigma-edit rule for outlier removal for a range of simulated error generation scenarios. The new method has also been used to reconstruct lunar topography using data from the NASA Apollo 15 orbiter, and we present visual and quantitative comparisons of RST-BA and L2-BA methods for this application. In particular, using the RST-BA algorithm we were able to reconstruct a DEM from unprocessed data with many outliers and no ground control points, which was not possible with the L2-BA method. △ Less

Submitted 6 November, 2011; originally announced November 2011.

Comments: 8 pages. Originally written in November 2009. Describes implementation of Robust Bundle Adjustment in NASA's VisionWorkbench package, available at https://github.com/visionworkbench/visionworkbench

MSC Class: 62F35; 65K10

arXiv:1001.3907 [pdf, other]

Robust and Trend-following Kalman Smoothers using Student's t

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which… ▽ More We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which we call the T-Trend smoother, is able to follow sudden changes in the process model, and is derived as a MAP solver for a model with Student's t-process noise and Gaussian observation noise. We design specialized methods to solve both problems which exploit the special structure of the Student's t-distribution, and provide a convergence theory. Both smoothers can be implemented with only minor modifications to an existing L2 smoother implementation. Numerical results for linear and nonlinear models illustrating both robust and fast tracking applications are presented. △ Less

Submitted 11 November, 2011; v1 submitted 21 January, 2010; originally announced January 2010.

Comments: 7 pages, 4 figures

MSC Class: 62F35; 65K10

Showing 1–42 of 42 results for author: Aravkin, A Y