-
An SDE Perspective on Stochastic Inertial Gradient Dynamics with Time-Dependent Viscosity and Geometric Dam**
Authors:
Rodrigo Maulen-Soto,
Jalal Fadili,
Hedy Attouch,
Peter Ochs
Abstract:
Our approach is part of the close link between continuous dissipative dynamical systems and optimization algorithms. We aim to solve convex minimization problems by means of stochastic inertial differential equations which are driven by the gradient of the objective function. This will provide a general mathematical framework for analyzing fast optimization algorithms with stochastic gradient inpu…
▽ More
Our approach is part of the close link between continuous dissipative dynamical systems and optimization algorithms. We aim to solve convex minimization problems by means of stochastic inertial differential equations which are driven by the gradient of the objective function. This will provide a general mathematical framework for analyzing fast optimization algorithms with stochastic gradient input. Our study is a natural extension of our previous work devoted to the first-order in time stochastic steepest descent. Our goal is to develop these results further by considering second-order stochastic differential equations in time, incorporating a viscous time-dependent dam** and a Hessian-driven dam**. To develop this program, we rely on stochastic Lyapunov analysis. Assuming a square-integrability condition on the diffusion term times a function dependant on the viscous dam**, and that the Hessian-driven dam** is a positive constant, our first main result shows that almost surely, there is convergence of the values, and states fast convergence of the values in expectation. Besides, in the case where the Hessian-driven dam** is zero, we conclude with the fast convergence of the values in expectation and in almost sure sense, we also managed to prove almost sure weak convergence of the trajectory. We provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex and strongly convex case.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Stable Phase Retrieval with Mirror Descent
Authors:
Jean-Jacques Godeme,
Jalal Fadili,
Claude Amra,
Myriam Zerrad
Abstract:
In this paper, we aim to reconstruct an n-dimensional real vector from m phaseless measurements corrupted by an additive noise. We extend the noiseless framework developed in [15], based on mirror descent (or Bregman gradient descent), to deal with noisy measurements and prove that the procedure is stable to (small enough) additive noise. In the deterministic case, we show that mirror descent conv…
▽ More
In this paper, we aim to reconstruct an n-dimensional real vector from m phaseless measurements corrupted by an additive noise. We extend the noiseless framework developed in [15], based on mirror descent (or Bregman gradient descent), to deal with noisy measurements and prove that the procedure is stable to (small enough) additive noise. In the deterministic case, we show that mirror descent converges to a critical point of the phase retrieval problem, and if the algorithm is well initialized and the noise is small enough, the critical point is near the true vector up to a global sign change. When the measurements are i.i.d Gaussian and the signal-to-noise ratio is large enough, we provide global convergence guarantees that ensure that with high probability, mirror descent converges to a global minimizer near the true vector (up to a global sign change), as soon as the number of measurements m is large enough. The sample complexity bound can be improved if a spectral method is used to provide a good initial guess. We complement our theoretical study with several numerical results showing that mirror descent is both a computationally and statistically efficient scheme to solve the phase retrieval problem.
△ Less
Submitted 20 June, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
A Quasi-Newton Primal-Dual Algorithm with Line Search
Authors:
Shida Wang,
Jalal Fadili,
Peter Ochs
Abstract:
Quasi-Newton methods refer to a class of algorithms at the interface between first and second order methods. They aim to progress as substantially as second order methods per iteration, while maintaining the computational complexity of first order methods. The approximation of second order information by first order derivatives can be expressed as adopting a variable metric, which for (limited mem…
▽ More
Quasi-Newton methods refer to a class of algorithms at the interface between first and second order methods. They aim to progress as substantially as second order methods per iteration, while maintaining the computational complexity of first order methods. The approximation of second order information by first order derivatives can be expressed as adopting a variable metric, which for (limited memory) quasi-Newton methods is of type ``identity $\pm$ low rank''. This paper continues the effort to make these powerful methods available for non-smooth systems occurring, for example, in large scale Machine Learning applications by exploiting this special structure. We develop a line search variant of a recently introduced quasi-Newton primal-dual algorithm, which adds significant flexibility, admits larger steps per iteration, and circumvents the complicated precalculation of a certain operator norm. We prove convergence, including convergence rates, for our proposed method and outperform related algorithms in a large scale image deblurring application.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation
Authors:
Michael Sucker,
Jalal Fadili,
Peter Ochs
Abstract:
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms prova…
▽ More
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones derived from a (deterministic) worst-case analysis. The results rely on PAC-Bayesian bounds for general, possibly unbounded loss-functions based on exponential families. Then, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum. Furthermore, we provide a concrete algorithmic realization of the framework and new methodologies for learning-to-optimize, and we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Stochastic Inertial Dynamics Via Time Scaling and Averaging
Authors:
Rodrigo Maulen-Soto,
Jalal Fadili,
Hedy Attouch,
Peter Ochs
Abstract:
Our work is part of the close link between continuous-time dissipative dynamical systems and optimization algorithms, and more precisely here, in the stochastic setting. We aim to study stochastic convex minimization problems through the lens of stochastic inertial differential inclusions that are driven by the subgradient of a convex objective function. This will provide a general mathematical fr…
▽ More
Our work is part of the close link between continuous-time dissipative dynamical systems and optimization algorithms, and more precisely here, in the stochastic setting. We aim to study stochastic convex minimization problems through the lens of stochastic inertial differential inclusions that are driven by the subgradient of a convex objective function. This will provide a general mathematical framework for analyzing the convergence properties of stochastic second-order inertial continuous-time dynamics involving vanishing viscous dam** and measurable stochastic subgradient selections. Our chief goal in this paper is to develop a systematic and unified way that transfers the properties recently studied for first-order stochastic differential equations to second-order ones involving even subgradients in lieu of gradients. This program will rely on two tenets: time scaling and averaging, following an approach recently developed in the literature by one of the co-authors in the deterministic case.
Under a mild integrability assumption involving the diffusion term and the viscous dam**, our first main result shows that almost surely, there is weak convergence of the trajectory towards a minimizer of the objective function and fast convergence of the values and gradients. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex, strongly convex, and (local) Polyak-Lojasiewicz case. Finally, using Tikhonov regularization with a properly tuned vanishing parameter, we can obtain almost sure strong convergence of the trajectory towards the minimum norm solution.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Tikhonov Regularization for Stochastic Non-Smooth Convex Optimization in Hilbert Spaces
Authors:
Rodrigo Maulen-Soto,
Jalal Fadili,
Hedy Attouch
Abstract:
To solve non-smooth convex optimization problems with a noisy gradient input, we analyze the global behavior of subgradient-like flows under stochastic errors. The objective function is composite, being equal to the sum of two convex functions, one being differentiable and the other potentially non-smooth. We then use stochastic differential inclusions where the drift term is minus the subgradient…
▽ More
To solve non-smooth convex optimization problems with a noisy gradient input, we analyze the global behavior of subgradient-like flows under stochastic errors. The objective function is composite, being equal to the sum of two convex functions, one being differentiable and the other potentially non-smooth. We then use stochastic differential inclusions where the drift term is minus the subgradient of the objective function, and the diffusion term is either bounded or square-integrable. In this context, under Lipschitz's continuity of the differentiable term and a growth condition of the non-smooth term, our first main result shows almost sure weak convergence of the trajectory process towards a minimizer of the objective function. Then, using Tikhonov regularization with a properly tuned vanishing parameter, we can obtain almost sure strong convergence of the trajectory towards the minimum norm solution. We find an explicit tuning of this parameter when our objective function satisfies a local error-bound inequality. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex and strongly convex case.
△ Less
Submitted 27 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems trained with Gradient Descent
Authors:
Nathan Buskulic,
Jalal Fadili,
Yvain Quéau
Abstract:
Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when…
▽ More
Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when trained through gradient flow with an appropriate initialization. In this paper, we extend these results by proving that these guarantees hold true when using gradient descent with an appropriately chosen step-size/learning rate. We also show that the discretization only affects the overparametrization bound for a two-layer DIP network by a constant and thus that the different guarantees found for the gradient flow will hold for gradient descent.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
The stochastic Ravine accelerated gradient method with general extrapolation coefficients
Authors:
Hedy Attouch,
Jalal Fadili,
Vyacheslav Kungurtsev
Abstract:
In a real Hilbert space domain setting, we study the convergence properties of the stochastic Ravine accelerated gradient method for convex differentiable optimization. We consider the general form of this algorithm where the extrapolation coefficients can vary with each iteration, and where the evaluation of the gradient is subject to random errors. This general treatment models a breadth of prac…
▽ More
In a real Hilbert space domain setting, we study the convergence properties of the stochastic Ravine accelerated gradient method for convex differentiable optimization. We consider the general form of this algorithm where the extrapolation coefficients can vary with each iteration, and where the evaluation of the gradient is subject to random errors. This general treatment models a breadth of practical algorithms and numerical implementations. We show that, under a proper tuning of the extrapolation parameters, and when the error variance associated with the gradient evaluations or the step-size sequences vanish sufficiently fast, the Ravine method provides fast convergence of the values both in expectation and almost surely. We also improve the convergence rates from O(.) to o(.). Moreover, we show almost sure summability property of the gradients, which implies the fast convergence of the gradients towards zero. This property reflects the fact that the high-resolution ODE of the Ravine method includes a Hessian-driven dam** term. When the space is also separable, our analysis allows also to establish almost sure weak convergence of the sequence of iterates provided by the algorithm. We finally specialize the analysis to consider different parameter choices, including vanishing and constant (heavy ball method with friction) dam** parameter, and present a comprehensive landscape of the tradeoffs in speed and accuracy associated with these parameter choices and statistical properties on the sequence of errors in the gradient computations. We provide a thorough discussion of the similarities and differences with the Nesterov accelerated gradient which satisfies similar asymptotic convergence rates.
△ Less
Submitted 21 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Solution uniqueness of convex optimization problems via the radial cone
Authors:
Jalal Fadili,
Tran T. A. Nghia,
Duy Nhat Phan
Abstract:
In this paper, we mainly study solution uniqueness of some convex optimization problems. Our characterizations of solution uniqueness are in terms of the radial cone. This approach allows us to know when a unique solution is a strong solution or even a tilt-stable one without checking second-order information. Consequently, we apply our theory to low-rank optimization problems. The radial cone is…
▽ More
In this paper, we mainly study solution uniqueness of some convex optimization problems. Our characterizations of solution uniqueness are in terms of the radial cone. This approach allows us to know when a unique solution is a strong solution or even a tilt-stable one without checking second-order information. Consequently, we apply our theory to low-rank optimization problems. The radial cone is fully calculated in this case and numerical experiments show that our characterizations are sharp.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Accelerated Gradient Dynamics on Riemannian Manifolds: Faster Rate and Trajectory Convergence
Authors:
Tejas Natu,
Camille Castera,
Jalal Fadili,
Peter Ochs
Abstract:
In order to minimize a differentiable geodesically convex function, we study a second-order dynamical system on Riemannian manifolds with an asymptotically vanishing dam** term of the form $α/t$. For positive values of $α$, convergence rates for the objective values and convergence of trajectory is derived. We emphasize the crucial role of the curvature of the manifold for the distinction of the…
▽ More
In order to minimize a differentiable geodesically convex function, we study a second-order dynamical system on Riemannian manifolds with an asymptotically vanishing dam** term of the form $α/t$. For positive values of $α$, convergence rates for the objective values and convergence of trajectory is derived. We emphasize the crucial role of the curvature of the manifold for the distinction of the modes of convergence. There is a clear correspondence to the results that are known in the Euclidean case. When $α$ is larger than a certain constant that depends on the curvature of the manifold, we improve the convergence rate of objective values compared to the previously known rate and prove the convergence of the trajectory of the dynamical system to an element of the set of minimizers. For $α$ smaller than this curvature-dependent constant, the best known sub-optimal rates for the objective values and the trajectory are transferred to the Riemannian setting. We present computational experiments that corroborate our theoretical results.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Discrete-to-Continuum Rates of Convergence for $p$-Laplacian Regularization
Authors:
Adrien Weihs,
Jalal Fadili,
Matthew Thorpe
Abstract:
Higher-order regularization problem formulations are popular frameworks used in machine learning, inverse problems and image/signal processing. In this paper, we consider the computational problem of finding the minimizer of the Sobolev $\mathrm{W}^{1,p}$ semi-norm with a data-fidelity term. We propose a discretization procedure and prove convergence rates between our numerical solution and the ta…
▽ More
Higher-order regularization problem formulations are popular frameworks used in machine learning, inverse problems and image/signal processing. In this paper, we consider the computational problem of finding the minimizer of the Sobolev $\mathrm{W}^{1,p}$ semi-norm with a data-fidelity term. We propose a discretization procedure and prove convergence rates between our numerical solution and the target function. Our approach consists of discretizing an appropriate gradient flow problem in space and time. The space discretization is a nonlocal approximation of the p-Laplacian operator and our rates directly depend on the localization parameter $ε_n$ and the time mesh-size $τ_n$. We precisely characterize the asymptotic behaviour of $ε_n$ and $τ_n$ in order to ensure convergence to the considered minimizer. Finally, we apply our results to the setting of random graph models.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
SimPINNs: Simulation-Driven Physics-Informed Neural Networks for Enhanced Performance in Nonlinear Inverse Problems
Authors:
Sidney Besnard,
Frédéric Jurie,
Jalal M. Fadili
Abstract:
This paper introduces a novel approach to solve inverse problems by leveraging deep learning techniques. The objective is to infer unknown parameters that govern a physical system based on observed data. We focus on scenarios where the underlying forward model demonstrates pronounced nonlinear behaviour, and where the dimensionality of the unknown parameter space is substantially smaller than that…
▽ More
This paper introduces a novel approach to solve inverse problems by leveraging deep learning techniques. The objective is to infer unknown parameters that govern a physical system based on observed data. We focus on scenarios where the underlying forward model demonstrates pronounced nonlinear behaviour, and where the dimensionality of the unknown parameter space is substantially smaller than that of the observations. Our proposed method builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model. Experimental results on an orbit restitution problem demonstrate that our approach surpasses the performance of standard PINNs, providing improved accuracy and robustness.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems
Authors:
Nathan Buskulic,
Jalal Fadili,
Yvain Quéau
Abstract:
Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to…
▽ More
Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
△ Less
Submitted 15 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Geometric characterizations for strong minima with applications to nuclear norm minimization problems
Authors:
Jalal Fadili,
Tran T. A. Nghia,
Duy Nhat Phan
Abstract:
In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are…
▽ More
In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Convergence Guarantees of Overparametrized Wide Deep Inverse Prior
Authors:
Nathan Buskulic,
Yvain Quéau,
Jalal Fadili
Abstract:
Neural networks have become a prominent approach to solve inverse problems in recent years. Amongst the different existing methods, the Deep Image/Inverse Priors (DIPs) technique is an unsupervised approach that optimizes a highly overparametrized neural network to transform a random input into an object whose image under the forward model matches the observation. However, the level of overparamet…
▽ More
Neural networks have become a prominent approach to solve inverse problems in recent years. Amongst the different existing methods, the Deep Image/Inverse Priors (DIPs) technique is an unsupervised approach that optimizes a highly overparametrized neural network to transform a random input into an object whose image under the forward model matches the observation. However, the level of overparametrization necessary for such methods remains an open problem. In this work, we aim to investigate this question for a two-layers neural network with a smooth activation function. We provide overparametrization bounds under which such network trained via continuous-time gradient descent will converge exponentially fast with high probability which allows to derive recovery prediction bounds. This work is thus a first step towards a theoretical understanding of overparametrized DIP networks, and more broadly it participates to the theoretical understanding of neural networks in inverse problem settings.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Continuous Newton-like Methods featuring Inertia and Variable Mass
Authors:
Camille Castera,
Hedy Attouch,
Jalal Fadili,
Peter Ochs
Abstract:
We introduce a new dynamical system, at the interface between second-order dynamics with inertia and Newton's method. This system extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass. For strongly convex optimization, we provide guarantees on how the Newtonian and inertial behaviors of the system can be non-as…
▽ More
We introduce a new dynamical system, at the interface between second-order dynamics with inertia and Newton's method. This system extends the class of inertial Newton-like dynamics by featuring a time-dependent parameter in front of the acceleration, called variable mass. For strongly convex optimization, we provide guarantees on how the Newtonian and inertial behaviors of the system can be non-asymptotically controlled by means of this variable mass. A connection with the Levenberg--Marquardt (or regularized Newton's) method is also made. We then show the effect of the variable mass on the asymptotic rate of convergence of the dynamics, and in particular, how it can turn the latter into an accelerated Newton method. We provide numerical experiments supporting our findings. This work represents a significant step towards designing new algorithms that benefit from the best of both first- and second-order optimization methods.
△ Less
Submitted 12 February, 2024; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Provable Phase Retrieval with Mirror Descent
Authors:
Jean-Jacques Godeme,
Jalal Fadili,
Xavier Buet,
Myriam Zerrad,
Michel Lequime,
Claude Amra
Abstract:
In this paper, we consider the problem of phase retrieval, which consists of recovering an $n$-dimensional real vector from the magnitude of its $m$ linear measurements. We propose a mirror descent (or Bregman gradient descent) algorithm based on a wisely chosen Bregman divergence, hence allowing to remove the classical global Lipschitz continuity requirement on the gradient of the non-convex phas…
▽ More
In this paper, we consider the problem of phase retrieval, which consists of recovering an $n$-dimensional real vector from the magnitude of its $m$ linear measurements. We propose a mirror descent (or Bregman gradient descent) algorithm based on a wisely chosen Bregman divergence, hence allowing to remove the classical global Lipschitz continuity requirement on the gradient of the non-convex phase retrieval objective to be minimized. We apply the mirror descent for two random measurements: the \iid standard Gaussian and those obtained by multiple structured illuminations through Coded Diffraction Patterns (CDP). For the Gaussian case, we show that when the number of measurements $m$ is large enough, then with high probability, for almost all initializers, the algorithm recovers the original vector up to a global sign change. For both measurements, the mirror descent exhibits a local linear convergence behaviour with a dimension-independent convergence rate. Our theoretical results are finally illustrated with various numerical experiments, including an application to the reconstruction of images in precision optics.
△ Less
Submitted 8 March, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Inertial Quasi-Newton Methods for Monotone Inclusion: Efficient Resolvent Calculus and Primal-Dual Methods
Authors:
Shida Wang,
Jalal Fadili,
Peter Ochs
Abstract:
We introduce an inertial quasi-Newton Forward-Backward Splitting Algorithm to solve a class of monotone inclusion problems. While the inertial step is computationally cheap, in general, the bottleneck is the evaluation of the resolvent operator. A change of the metric makes its computation hard even for (otherwise in the standard metric) simple operators. In order to fully exploit the advantage of…
▽ More
We introduce an inertial quasi-Newton Forward-Backward Splitting Algorithm to solve a class of monotone inclusion problems. While the inertial step is computationally cheap, in general, the bottleneck is the evaluation of the resolvent operator. A change of the metric makes its computation hard even for (otherwise in the standard metric) simple operators. In order to fully exploit the advantage of adapting the metric, we develop a new efficient resolvent calculus for a low-rank perturbed standard metric, which accounts exactly for quasi-Newton metrics. Moreover, we prove the convergence of our algorithms, including linear convergence rates in case one of the two considered operators is strongly monotone. Beyond the general monotone inclusion setup, we instantiate a novel inertial quasi-Newton Primal-Dual Hybrid Gradient Method for solving saddle point problems. The favourable performance of our inertial quasi-Newton PDHG method is demonstrated on several numerical experiments in image processing.
△ Less
Submitted 15 March, 2024; v1 submitted 28 September, 2022;
originally announced September 2022.
-
An SDE perspective on stochastic convex optimization
Authors:
Rodrigo Maulen-Soto,
Jalal Fadili,
Hedy Attouch
Abstract:
We analyze the global and local behavior of gradient-like flows under stochastic errors towards the aim of solving convex optimization problems with noisy gradient input. We first study the unconstrained differentiable convex case, using a stochastic differential equation where the drift term is minus the gradient of the objective function and the diffusion term is either bounded or square-integra…
▽ More
We analyze the global and local behavior of gradient-like flows under stochastic errors towards the aim of solving convex optimization problems with noisy gradient input. We first study the unconstrained differentiable convex case, using a stochastic differential equation where the drift term is minus the gradient of the objective function and the diffusion term is either bounded or square-integrable. In this context, under Lipschitz continuity of the gradient, our first main result shows almost sure convergence of the objective and the trajectory process towards a minimizer of the objective function. We also provide a comprehensive complexity analysis by establishing several new pointwise and ergodic convergence rates in expectation for the convex, strongly convex, and (local) Łojasiewicz case. The latter, which involves local analysis, is challenging and requires non-trivial arguments from measure theory. Then, we extend our study to the constrained case and more generally to certain nonsmooth situations. We show that several of our results have natural extensions obtained by replacing the gradient of the objective function by a cocoercive monotone operator. This makes it possible to obtain similar convergence results for optimization problems with an additively "smooth + non-smooth" convex structure. Finally, we consider another extension of our results to non-smooth optimization which is based on the Moreau envelope.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
From the Ravine method to the Nesterov method and vice versa: a dynamical system perspective
Authors:
H. Attouch,
J. Fadili
Abstract:
We revisit the Ravine method of Gelfand and Tsetlin from a dynamical system perspective, study its convergence properties, and highlight its similarities and differences with the Nesterov accelerated gradient method. The two methods are closely related. They can be deduced from each other by reversing the order of the extrapolation and gradient operations in their definitions. They benefit from si…
▽ More
We revisit the Ravine method of Gelfand and Tsetlin from a dynamical system perspective, study its convergence properties, and highlight its similarities and differences with the Nesterov accelerated gradient method. The two methods are closely related. They can be deduced from each other by reversing the order of the extrapolation and gradient operations in their definitions. They benefit from similar fast convergence of values and convergence of iterates for general convex objective functions. We will also establish the high resolution ODE of the Ravine and Nesterov methods, and reveal an additional geometric dam** term driven by the Hessian for both methods. This will allow us to prove fast convergence towards zero of the gradients not only for the Ravine method but also for the Nesterov method for the first time. We also highlight connections to other algorithms stemming from more subtle discretization schemes, and finally describe a Ravine version of the proximal-gradient algorithms for general structured smooth + non-smooth convex optimization problems.
△ Less
Submitted 1 February, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
A Stochastic Bregman Primal-Dual Splitting Algorithm for Composite Optimization
Authors:
Antonio Silveti-Falls,
Cesare Molinari,
Jalal Fadili
Abstract:
We study a stochastic first order primal-dual method for solving convex-concave saddle point problems over real reflexive Banach spaces using Bregman divergences and relative smoothness assumptions, in which we allow for stochastic error in the computation of gradient terms within the algorithm. We show ergodic convergence in expectation of the Lagrangian optimality gap with a rate of O(1/k) and t…
▽ More
We study a stochastic first order primal-dual method for solving convex-concave saddle point problems over real reflexive Banach spaces using Bregman divergences and relative smoothness assumptions, in which we allow for stochastic error in the computation of gradient terms within the algorithm. We show ergodic convergence in expectation of the Lagrangian optimality gap with a rate of O(1/k) and that every almost sure weak cluster point of the ergodic sequence is a saddle point in expectation under mild assumptions. Under slightly stricter assumptions, we show almost sure weak convergence of the pointwise iterates to a saddle point. Under a relative strong convexity assumption on the objective functions and a total convexity assumption on the entropies of the Bregman divergences, we establish almost sure strong convergence of the pointwise iterates to a saddle point. Our framework is general and does not need strong convexity of the entropies inducing the Bregman divergences in the algorithm. Numerical applications are considered including entropically regularized Wasserstein barycenter problems and regularized inverse problems on the simplex.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Sharp, strong and unique minimizers for low complexity robust recovery
Authors:
Jalal Fadili,
Tran T. A. Nghia,
Trinh T. T. Tran
Abstract:
In this paper, we show the important roles of sharp minima and strong minima for robust recovery. We also obtain several characterizations of sharp minima for convex regularized optimization problems. Our characterizations are quantitative and verifiable especially for the case of decomposable norm regularized problems including sparsity, group-sparsity, and low-rank convex problems. For group-spa…
▽ More
In this paper, we show the important roles of sharp minima and strong minima for robust recovery. We also obtain several characterizations of sharp minima for convex regularized optimization problems. Our characterizations are quantitative and verifiable especially for the case of decomposable norm regularized problems including sparsity, group-sparsity, and low-rank convex problems. For group-sparsity optimization problems, we show that a unique solution is a strong solution and obtain quantitative characterizations for solution uniqueness.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Convergence of iterates for first-order optimization algorithms with inertia and Hessian driven dam**
Authors:
Hedy Attouch,
Zaki Chbani,
Jalal Fadili,
Hassan Riahi
Abstract:
In a Hilbert space setting, for convex optimization, we show the convergence of the iterates to optimal solutions for a class of accelerated first-order algorithms. They can be interpreted as discrete temporal versions of an inertial dynamic involving both viscous dam** and Hessian-driven dam**. The asymptotically vanishing viscous dam** is linked to the accelerated gradient method of Nester…
▽ More
In a Hilbert space setting, for convex optimization, we show the convergence of the iterates to optimal solutions for a class of accelerated first-order algorithms. They can be interpreted as discrete temporal versions of an inertial dynamic involving both viscous dam** and Hessian-driven dam**. The asymptotically vanishing viscous dam** is linked to the accelerated gradient method of Nesterov while the Hessian driven dam** makes it possible to significantly attenuate the oscillations. By treating the Hessian-driven dam** as the time derivative of the gradient term, this gives, in discretized form, first-order algorithms. These results complement the previous work of the authors where it was shown the fast convergence of the values, and the fast convergence towards zero of the gradients.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
On the effect of perturbations in first-order optimization methods with inertia and Hessian driven dam**
Authors:
Hedy Attouch,
Jalal Fadili,
Vyacheslav Kungurtsev
Abstract:
Second-order continuous-time dissipative dynamical systems with viscous and Hessian driven dam** have inspired effective first-order algorithms for solving convex optimization problems. While preserving the fast convergence properties of the Nesterov-type acceleration, the Hessian driven dam** makes it possible to significantly attenuate the oscillations. To study the stability of these algori…
▽ More
Second-order continuous-time dissipative dynamical systems with viscous and Hessian driven dam** have inspired effective first-order algorithms for solving convex optimization problems. While preserving the fast convergence properties of the Nesterov-type acceleration, the Hessian driven dam** makes it possible to significantly attenuate the oscillations. To study the stability of these algorithms with respect to perturbations, we analyze the behaviour of the corresponding continuous systems when the gradient computation is subject to exogenous additive errors. We provide a quantitative analysis of the asymptotic behaviour of two types of systems, those with implicit and explicit Hessian driven dam**. We consider convex, strongly convex, and non-smooth objective functions defined on a real Hilbert space and show that, depending on the formulation, different integrability conditions on the perturbations are sufficient to maintain the convergence rates of the systems. We highlight the differences between the implicit and explicit Hessian dam**, and in particular point out that the assumptions on the objective and perturbations needed in the implicit case are more stringent than in the explicit case.
△ Less
Submitted 17 March, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Limits and consistency of non-local and graph approximations to the Eikonal equation
Authors:
Jalal Fadili,
Nicolas Forcadel,
Thi Tuyen Nguyen,
Rita Zantout
Abstract:
In this paper, we study a non-local approximation of the time-dependent (local) Eikonal equation with Dirichlet-type boundary conditions, where the kernel in the non-local problem is properly scaled. Based on the theory of viscosity solutions, we prove existence and uniqueness of the viscosity solutions of both the local and non-local problems, as well as regularity properties of these solutions i…
▽ More
In this paper, we study a non-local approximation of the time-dependent (local) Eikonal equation with Dirichlet-type boundary conditions, where the kernel in the non-local problem is properly scaled. Based on the theory of viscosity solutions, we prove existence and uniqueness of the viscosity solutions of both the local and non-local problems, as well as regularity properties of these solutions in time and space. We then derive error bounds between the solution to the non-local problem and that of the local one, both in continuous-time and Backward Euler time discretization. We then turn to studying continuum limits of non-local problems defined on random weighted graphs with $n$ vertices. In particular, we establish that if the kernel scale parameter decreases at an appropriate rate as $n$ grows, then almost surely, the solution of the problem on graphs converges uniformly to the viscosity solution of the local problem as the time step vanishes and the number vertices $n$ grows large.
△ Less
Submitted 21 November, 2022; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Fast convergence of dynamical ADMM via time scaling of damped inertial dynamics
Authors:
Hedy Attouch,
Zaki Chbani,
Jalal Fadili,
Hassan Riahi
Abstract:
In this paper, we propose in a Hilbertian setting a second-order time-continuous dynamic system with fast convergence guarantees to solve structured convex minimization problems with an affine constraint. The system is associated with the augmented Lagrangian formulation of the minimization problem. The corresponding dynamics brings into play three general time-varying parameters, each with specif…
▽ More
In this paper, we propose in a Hilbertian setting a second-order time-continuous dynamic system with fast convergence guarantees to solve structured convex minimization problems with an affine constraint. The system is associated with the augmented Lagrangian formulation of the minimization problem. The corresponding dynamics brings into play three general time-varying parameters, each with specific properties, and which are respectively associated with viscous dam**, extrapolation and temporal scaling. By appropriately adjusting these parameters, we develop a Lyapunov analysis which provides fast convergence properties of the values and of the feasibility gap. These results will naturally pave the way for develo** corresponding accelerated ADMM algorithms, obtained by temporal discretization.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Global Convergence of Model Function Based Bregman Proximal Minimization Algorithms
Authors:
Mahesh Chandra Mukkamala,
Jalal Fadili,
Peter Ochs
Abstract:
Lipschitz continuity of the gradient map** of a continuously differentiable function plays a crucial role in designing various optimization algorithms. However, many functions arising in practical applications such as low rank matrix factorization or deep neural network problems do not have a Lipschitz continuous gradient. This led to the development of a generalized notion known as the $L$-smad…
▽ More
Lipschitz continuity of the gradient map** of a continuously differentiable function plays a crucial role in designing various optimization algorithms. However, many functions arising in practical applications such as low rank matrix factorization or deep neural network problems do not have a Lipschitz continuous gradient. This led to the development of a generalized notion known as the $L$-smad property, which is based on generalized proximity measures called Bregman distances. However, the $L$-smad property cannot handle nonsmooth functions, for example, simple nonsmooth functions like $\abs{x^4-1}$ and also many practical composite problems are out of scope. We fix this issue by proposing the MAP property, which generalizes the $L$-smad property and is also valid for a large class of nonconvex nonsmooth composite problems. Based on the proposed MAP property, we propose a globally convergent algorithm called Model BPG, that unifies several existing algorithms. The convergence analysis is based on a new Lyapunov function. We also numerically illustrate the superior performance of Model BPG on standard phase retrieval problems, robust phase retrieval problems, and Poisson linear inverse problems, when compared to a state of the art optimization method that is valid for generic nonconvex nonsmooth optimization problems.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Continuum limit of $p$-Laplacian evolution problems on graphs:$L^q$ graphons and sparse graphs
Authors:
Imad El Bouchairi,
Jalal Fadili,
Abderrahim Elmoataz
Abstract:
In this paper we study continuum limits of the discretized $p$-Laplacian evolution problem on sparse graphs with homogeneous Neumann boundary conditions. This extends the results of [24] to a far more general class of kernels, possibly singular, and graph sequences whose limit are the so-called $L^q$-graphons. More precisely, we derive a bound on the distance between two continuous-in-time traject…
▽ More
In this paper we study continuum limits of the discretized $p$-Laplacian evolution problem on sparse graphs with homogeneous Neumann boundary conditions. This extends the results of [24] to a far more general class of kernels, possibly singular, and graph sequences whose limit are the so-called $L^q$-graphons. More precisely, we derive a bound on the distance between two continuous-in-time trajectories defined by two different evolution systems (i.e. with different kernels, second member and initial data). Similarly, we provide a bound in the case that one of the trajectories is discrete-in-time and the other is continuous. In turn, these results lead us to establish error estimates of the full discretization of the $p$-Laplacian problem on sparse random graphs. In particular, we provide rate of convergence of solutions for the discrete models to the solution of the continuous problem as the number of vertices grows.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step
Authors:
Antonio Silveti-Falls,
Cesare Molinari,
Jalal Fadili
Abstract:
In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in the authors' previous paper, which we denote ICGALP, that allows for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical applic…
▽ More
In this paper we propose and analyze inexact and stochastic versions of the CGALP algorithm developed in the authors' previous paper, which we denote ICGALP, that allows for errors in the computation of several important quantities. In particular this allows one to compute some gradients, proximal terms, and/or linear minimization oracles in an inexact fashion that facilitates the practical application of the algorithm to computationally intensive settings, e.g. in high (or possibly infinite) dimensional Hilbert spaces commonly found in machine learning problems. The algorithm is able to solve composite minimization problems involving the sum of three convex proper lower-semicontinuous functions subject to an affine constraint of the form $Ax=b$ for some bounded linear operator $A$. Only one of the functions in the objective is assumed to be differentiable, the other two are assumed to have an accessible prox operator and a linear minimization oracle. As main results, we show convergence of the Lagrangian to an optimum and asymptotic feasibility of the affine constraint as well as weak convergence of the dual variable to a solution of the dual problem, all in an almost sure sense. Almost sure convergence rates, both pointwise and ergodic, are given for the Lagrangian values and the feasibility gap. Numerical experiments verifying the predicted rates of convergence are shown as well.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Wasserstein Control of Mirror Langevin Monte Carlo
Authors:
Kelvin Shuangjian Zhang,
Gabriel Peyré,
Jalal Fadili,
Marcelo Pereyra
Abstract:
Discretized Langevin diffusions are efficient Monte Carlo methods for sampling from high dimensional target densities that are log-Lipschitz-smooth and (strongly) log-concave. In particular, the Euclidean Langevin Monte Carlo sampling algorithm has received much attention lately, leading to a detailed understanding of its non-asymptotic convergence properties and of the role that smoothness and lo…
▽ More
Discretized Langevin diffusions are efficient Monte Carlo methods for sampling from high dimensional target densities that are log-Lipschitz-smooth and (strongly) log-concave. In particular, the Euclidean Langevin Monte Carlo sampling algorithm has received much attention lately, leading to a detailed understanding of its non-asymptotic convergence properties and of the role that smoothness and log-concavity play in the convergence rate. Distributions that do not possess these regularity properties can be addressed by considering a Riemannian Langevin diffusion with a metric capturing the local geometry of the log-density. However, the Monte Carlo algorithms derived from discretizations of such Riemannian Langevin diffusions are notoriously difficult to analyze. In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme. We establish for the first time a non-asymptotic upper-bound on the sampling error of the resulting Hessian Riemannian Langevin Monte Carlo algorithm. This bound is measured according to a Wasserstein distance induced by a Riemannian metric ground cost capturing the Hessian structure and closely related to a self-concordance-like condition. The upper-bound implies, for instance, that the iterates contract toward a Wasserstein ball around the target density whose radius is made explicit. Our theory recovers existing Euclidean results and can cope with a wide variety of Hessian metrics related to highly non-flat geometries.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Learning CHARME models with neural networks
Authors:
José G. Gómez García,
Jalal Fadili,
Christophe Chesneau
Abstract:
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is stationary, ergodic and $τ$-weakly dependent. These conditions are much weaker than those p…
▽ More
In this paper, we consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is stationary, ergodic and $τ$-weakly dependent. These conditions are much weaker than those presented in the literature that treats this model. Moreover, this result forms the theoretical basis for deriving an asymptotic theory of the underlying (non)parametric estimation, which we present for this model. As an application, from the universal approximation property of neural networks (NN), we develop a learning theory for the NN-based autoregressive functions of the model, where the strong consistency and asymptotic normality of the considered estimator of the NN weights and biases are guaranteed under weak conditions.
△ Less
Submitted 17 November, 2020; v1 submitted 8 February, 2020;
originally announced February 2020.
-
First-order optimization algorithms via inertial systems with Hessian driven dam**
Authors:
Hedy Attouch,
Zaki Chbani,
Jalal Fadili,
Hassan Riahi
Abstract:
In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dam**s. The geometrical dam** driven by the Hessian intervenes in the dynamics in the form $\nabla^2 f (x(t)) \dot{x} (t)$. By treat…
▽ More
In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dam**s. The geometrical dam** driven by the Hessian intervenes in the dynamics in the form $\nabla^2 f (x(t)) \dot{x} (t)$. By treating this term as the time derivative of $ \nabla f (x (t)) $, this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.
△ Less
Submitted 6 November, 2020; v1 submitted 24 July, 2019;
originally announced July 2019.
-
Optimal reduced model algorithms for data-based state estimation
Authors:
Albert Cohen,
Wolfgang Dahmen,
Ron DeVore,
Jalal Fadili,
Olga Mula,
James Nichols
Abstract:
Reduced model spaces, such as reduced basis and polynomial chaos, are linear spaces $V_n$ of finite dimension $n$ which are designed for the efficient approximation of families parametrized PDEs in a Hilbert space $V$. The manifold $\mathcal{M}$ that gathers the solutions of the PDE for all admissible parameter values is globally approximated by the space $V_n$ with some controlled accuracy $ε_n$,…
▽ More
Reduced model spaces, such as reduced basis and polynomial chaos, are linear spaces $V_n$ of finite dimension $n$ which are designed for the efficient approximation of families parametrized PDEs in a Hilbert space $V$. The manifold $\mathcal{M}$ that gathers the solutions of the PDE for all admissible parameter values is globally approximated by the space $V_n$ with some controlled accuracy $ε_n$, which is typically much smaller than when using standard approximation spaces of the same dimension such as finite elements. Reduced model spaces have also been proposed in [13] as a vehicle to design a simple linear recovery algorithm of the state $u\in\mathcal{M}$ corresponding to a particular solution when the values of parameters are unknown but a set of data is given by $m$ linear measurements of the state. The measurements are of the form $\ell_j(u)$, $j=1,\dots,m$, where the $\ell_j$ are linear functionals on $V$. The analysis of this approach in [2] shows that the recovery error is bounded by $μ_nε_n$, where $μ_n=μ(V_n,W)$ is the inverse of an inf-sup constant that describe the angle between $V_n$ and the space $W$ spanned by the Riesz representers of $(\ell_1,\dots,\ell_m)$. A reduced model space which is efficient for approximation might thus be ineffective for recovery if $μ_n$ is large or infinite. In this paper, we discuss the existence and construction of an optimal reduced model space for this recovery method, and we extend our search to affine spaces. Our basic observation is that this problem is equivalent to the search of an optimal affine algorithm for the recovery of $\mathcal{M}$ in the worst case error sense. This allows us to perform our search by a convex optimization procedure. Numerical tests illustrate that the reduced model spaces constructed from our approach perform better than the classical reduced basis spaces.
△ Less
Submitted 2 August, 2020; v1 submitted 19 March, 2019;
originally announced March 2019.
-
Generalized Conditional Gradient with Augmented Lagrangian for Composite Minimization
Authors:
Antonio Silveti-Falls,
Cesare Molinari,
Jalal Fadili
Abstract:
In this paper we propose a splitting scheme which hybridizes generalized conditional gradient with a proximal step which we call CGALP algorithm, for minimizing the sum of three proper convex and lower-semicontinuous functions in real Hilbert spaces. The minimization is subject to an affine constraint, that allows in particular to deal with composite problems (sum of more than three functions) in…
▽ More
In this paper we propose a splitting scheme which hybridizes generalized conditional gradient with a proximal step which we call CGALP algorithm, for minimizing the sum of three proper convex and lower-semicontinuous functions in real Hilbert spaces. The minimization is subject to an affine constraint, that allows in particular to deal with composite problems (sum of more than three functions) in a separate way by the usual product space technique. While classical conditional gradient methods require Lipschitz-continuity of the gradient of the differentiable part of the objective, CGALP needs only differentiability (on an appropriate subset), hence circumventing the intricate question of Lipschitz continuity of gradients. For the two remaining functions in the objective, we do not require any additional regularity assumption. The second function, possibly nonsmooth, is assumed simple, i.e., the associated proximal map** is easily computable. For the third function, again nonsmooth, we just assume that its domain is also bounded and that a linearly perturbed minimization oracle is accessible. In particular, this last function can be chosen to be the indicator of a nonempty bounded closed convex set, in order to deal with additional constraints. Finally, the affine constraint is addressed by the augmented Lagrangian approach. Our analysis is carried out for a wide choice of algorithm parameters satisfying so called "open loop" rules. As main results, under mild conditions, we show asymptotic feasibility with respect to the affine constraint, boundedness of the dual multipliers, and convergence of the Lagrangian values to the saddle-point optimal value. We also provide (subsequential) rates of convergence for both the feasibility gap and the Lagrangian values.
△ Less
Submitted 7 October, 2019; v1 submitted 4 January, 2019;
originally announced January 2019.
-
Nonlocal $p$-Laplacian Variational problems on graphs
Authors:
Yosra Hafiene,
Jalal Fadili,
Abderrahim Elmoataz
Abstract:
In this paper, we study a nonlocal variational problem which consists of minimizing in $L^2$ the sum of a quadratic data fidelity and a regularization term corresponding to the $L^p$-norm of the nonlocal gradient. In particular, we study convergence of the numerical solution to a discrete version of this nonlocal variational problem to the unique solution of the continuum one. To do so, we derive…
▽ More
In this paper, we study a nonlocal variational problem which consists of minimizing in $L^2$ the sum of a quadratic data fidelity and a regularization term corresponding to the $L^p$-norm of the nonlocal gradient. In particular, we study convergence of the numerical solution to a discrete version of this nonlocal variational problem to the unique solution of the continuum one. To do so, we derive an error bound and highlight the role of the initial data and the kernel governing the nonlocal interactions. When applied to variational problem on graphs, this error bound allows us to show the consistency of the discretized variational problem as the number of vertices goes to infinity. More precisely, for networks in convergent graph sequences (simple and weighted deterministic dense graphs as well as random inhomogeneous graphs), we prove convergence and provide rate of convergence of solutions for the discrete models to the solution of the continuum problem as the number of vertices grows.
△ Less
Submitted 20 August, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Model Consistency for Learning with Mirror-Stratifiable Regularizers
Authors:
Jalal Fadili,
Guillaume Garrigos,
Jérome Malick,
Gabriel Peyré
Abstract:
Low-complexity non-smooth convex regularizers are routinely used to impose some structure (such as sparsity or low-rank) on the coefficients for linear predictors in supervised learning. Model consistency consists then in selecting the correct structure (for instance support or rank) by regularized empirical risk minimization.
It is known that model consistency holds under appropriate non-degene…
▽ More
Low-complexity non-smooth convex regularizers are routinely used to impose some structure (such as sparsity or low-rank) on the coefficients for linear predictors in supervised learning. Model consistency consists then in selecting the correct structure (for instance support or rank) by regularized empirical risk minimization.
It is known that model consistency holds under appropriate non-degeneracy conditions. However such conditions typically fail for highly correlated designs and it is observed that regularization methods tend to select larger models.
In this work, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers. This class of regularizers encompasses the most well-known in the literature, including the $\ell_1$ or trace norms. It brings into play a pair of primal-dual models, which in turn allows one to locate the structure of the solution using a specific dual certificate.
We also show how this analysis is applicable to optimal solutions of the learning problem, and also to the iterates computed by a certain class of stochastic proximal-gradient algorithms.
△ Less
Submitted 16 January, 2019; v1 submitted 22 March, 2018;
originally announced March 2018.
-
On Quasi-Newton Forward--Backward Splitting: Proximal Calculus and Convergence
Authors:
Stephen Becker,
Jalal Fadili,
Peter Ochs
Abstract:
We introduce a framework for quasi-Newton forward--backward splitting algorithms (proximal quasi-Newton methods) with a metric induced by diagonal $\pm$ rank-$r$ symmetric positive definite matrices. This special type of metric allows for a highly efficient evaluation of the proximal map**. The key to this efficiency is a general proximal calculus in the new metric. By using duality, formulas ar…
▽ More
We introduce a framework for quasi-Newton forward--backward splitting algorithms (proximal quasi-Newton methods) with a metric induced by diagonal $\pm$ rank-$r$ symmetric positive definite matrices. This special type of metric allows for a highly efficient evaluation of the proximal map**. The key to this efficiency is a general proximal calculus in the new metric. By using duality, formulas are derived that relate the proximal map** in a rank-$r$ modified metric to the original metric. We also describe efficient implementations of the proximity calculation for a large class of functions; the implementations exploit the piece-wise linear nature of the dual problem. Then, we apply these results to acceleration of composite convex minimization problems, which leads to elegant quasi-Newton methods for which we prove convergence. The algorithm is tested on several numerical examples and compared to a comprehensive list of alternatives in the literature. Our quasi-Newton splitting algorithm with the prescribed metric compares favorably against state-of-the-art. The algorithm has extensive applications including signal processing, sparse recovery, machine learning and classification to name a few.
△ Less
Submitted 22 November, 2018; v1 submitted 26 January, 2018;
originally announced January 2018.
-
Convergence rates of Forward--Douglas--Rachford splitting method
Authors:
Cesare Molinari,
**gwei Liang,
Jalal Fadili
Abstract:
Over the past years, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward--Douglas--Rachford splitting method (FDR) [10,40], and study both global and local convergence rates of this method. For the global rate, we establish an $o(1/k)$ convergence rate in terms of a Bregman divergence suitabl…
▽ More
Over the past years, operator splitting methods have become ubiquitous for non-smooth optimization owing to their simplicity and efficiency. In this paper, we consider the Forward--Douglas--Rachford splitting method (FDR) [10,40], and study both global and local convergence rates of this method. For the global rate, we establish an $o(1/k)$ convergence rate in terms of a Bregman divergence suitably designed for the objective function. Moreover, when specializing to the case of Forward--Backward splitting method, we show that convergence rate of the objective function of the method is actually $o(1/k)$ for a large choice of the descent step-size. Then locally, based on the assumption that the non-smooth part of the optimization problem is partly smooth, we establish local linear convergence of the method. More precisely, we show that the sequence generated by FDR method first (i) identifies a smooth manifold in a finite number of iteration, and then (ii) enters a local linear convergence regime, which is for instance characterized in terms of the structure of the underlying active smooth manifold. To exemplify the usefulness of the obtained result, we consider several concrete numerical experiments arising from applicative fields including, for instance, signal/image processing, inverse problems and machine learning.
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
Sensitivity Analysis for Mirror-Stratifiable Convex Functions
Authors:
Jalal Fadili,
Jérôme Malick,
Gabriel Peyré
Abstract:
This paper provides a set of sensitivity analysis and activity identification results for a class of convex functions with a strong geometric structure, that we coined "mirror-stratifiable". These functions are such that there is a bijection between a primal and a dual stratification of the space into partitioning sets, called strata. This pairing is crucial to track the strata that are identifiab…
▽ More
This paper provides a set of sensitivity analysis and activity identification results for a class of convex functions with a strong geometric structure, that we coined "mirror-stratifiable". These functions are such that there is a bijection between a primal and a dual stratification of the space into partitioning sets, called strata. This pairing is crucial to track the strata that are identifiable by solutions of parametrized optimization problems or by iterates of optimization algorithms. This class of functions encompasses all regularizers routinely used in signal and image processing, machine learning, and statistics. We show that this "mirror-stratifiable" structure enjoys a nice sensitivity theory, allowing us to study stability of solutions of optimization problems to small perturbations, as well as activity identification of first-order proximal splitting-type algorithms. Existing results in the literature typically assume that, under a non-degeneracy condition, the active set associated to a minimizer is stable to small perturbations and is identified in finite time by optimization schemes. In contrast, our results do not require any non-degeneracy assumption: in consequence, the optimal active set is not necessarily stable anymore, but we are able to track precisely the set of identifiable strata.We show that these results have crucial implications when solving challenging ill-posed inverse problems via regularization, a typical scenario where the non-degeneracy condition is not fulfilled. Our theoretical results, illustrated by numerical simulations, allow to characterize the instability behaviour of the regularized solutions, by locating the set of all low-dimensional strata that can be potentially identified by these solutions.
△ Less
Submitted 5 June, 2018; v1 submitted 11 July, 2017;
originally announced July 2017.
-
Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms
Authors:
Peter Ochs,
Jalal Fadili,
Thomas Brox
Abstract:
We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain…
▽ More
We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain a flexible algorithm for which we prove (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error. Special instances of the algorithm with a Euclidean distance function are, for example, Gradient Descent, Forward--Backward Splitting, ProxDescent, without the common requirement of a "Lipschitz continuous gradient". In addition, we consider a broad class of Bregman distance functions (generated by Legendre functions) replacing the Euclidean distance. The algorithm has a wide range of applications including many linear and non-linear inverse problems in signal/image processing and machine learning.
△ Less
Submitted 25 June, 2018; v1 submitted 7 July, 2017;
originally announced July 2017.
-
Local Linear Convergence Analysis of Primal-Dual Splitting Methods
Authors:
**gwei Liang,
Jalal Fadili,
Gabriel Peyré
Abstract:
In this paper, we study the local linear convergence properties of a versatile class of Primal-Dual splitting methods for minimizing composite non-smooth convex optimization problems. Under the assumption that the non-smooth components of the problem are partly smooth relative to smooth manifolds, we present a unified local convergence analysis framework for these methods. More precisely, in our f…
▽ More
In this paper, we study the local linear convergence properties of a versatile class of Primal-Dual splitting methods for minimizing composite non-smooth convex optimization problems. Under the assumption that the non-smooth components of the problem are partly smooth relative to smooth manifolds, we present a unified local convergence analysis framework for these methods. More precisely, in our framework we first show that (i) the sequences generated by Primal-Dual splitting methods identify a pair of primal and dual smooth manifolds in a finite number of iterations, and then (ii) enter a local linear convergence regime, which is characterized based on the structure of the underlying active smooth manifolds. We also show how our results for Primal-Dual splitting can be specialized to cover existing ones on Forward-Backward splitting and Douglas-Rachford splitting/ADMM (alternating direction methods of multipliers). Moreover, based on these obtained local convergence analysis result, several practical acceleration techniques are discussed. To exemplify the usefulness of the obtained result, we consider several concrete numerical experiments arising from fields including signal/image processing, inverse problems and machine learning, etc. The demonstration not only verifies the local linear convergence behaviour of Primal-Dual splitting methods, but also the insights on how to accelerate them in practice.
△ Less
Submitted 9 January, 2018; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Sharp Oracle Inequalities for Low-complexity Priors
Authors:
Tung Duy Luu,
Jalal Fadili,
Christophe Chesneau
Abstract:
In this paper,we consider a high-dimensional statistical estimation problem in which the the number of parameters is comparable or larger than the sample size. We present a unified analysis of the performance guarantees of exponential weighted aggregation and penalized estimators with a general class of data losses and priors which encourage objects which conform to some notion of simplicity/compl…
▽ More
In this paper,we consider a high-dimensional statistical estimation problem in which the the number of parameters is comparable or larger than the sample size. We present a unified analysis of the performance guarantees of exponential weighted aggregation and penalized estimators with a general class of data losses and priors which encourage objects which conform to some notion of simplicity/complexity. More precisely, we show that these two estimators satisfy sharp oracle inequalities for prediction ensuring their good theoretical performances. We also highlight the differences between them. When the noise is random, we provide oracle inequalities in probability using concentration inequalities. These results are then applied to several instances including the Lasso, the group Lasso, their analysis-type counterparts, the $\ell_\infty$ and the nuclear norm penalties. All our estimators can be efficiently implemented using proximal splitting algorithms.
△ Less
Submitted 3 October, 2017; v1 submitted 10 February, 2017;
originally announced February 2017.
-
Nonlocal $p$-Laplacian evolution problems on graphs
Authors:
Hafiene Yosra,
Jalal Fadili,
Abderrahim Elmoataz
Abstract:
In this paper we study numerical approximations of the evolution problem for the nonlocal $p$-Laplacian with homogeneous Neumann boundary conditions. First, we derive a bound on the distance between two continuous-in-time trajectories defined by two different evolution systems (i.e. with different kernels and initial data). We then provide a similar bound for the case when one of the trajectories…
▽ More
In this paper we study numerical approximations of the evolution problem for the nonlocal $p$-Laplacian with homogeneous Neumann boundary conditions. First, we derive a bound on the distance between two continuous-in-time trajectories defined by two different evolution systems (i.e. with different kernels and initial data). We then provide a similar bound for the case when one of the trajectories is discrete-in-time and the other is continuous. In turn, these results allow us to establish error estimates of the discretized $p$-Laplacian problem on graphs. More precisely, for networks on convergent graph sequences (simple and weighted graphs), we prove convergence and provide rate of convergence of solutions for the discrete models to the solution of the continuous problem as the number of vertices grows. We finally touch on the limit as $p \to \infty$ in these approximations and get uniform convergence results.
△ Less
Submitted 26 April, 2019; v1 submitted 21 December, 2016;
originally announced December 2016.
-
Sparse Support Recovery with Non-smooth Loss Functions
Authors:
Kévin Degraux,
Gabriel Peyré,
Jalal M. Fadili,
Laurent Jacques
Abstract:
In this paper, we study the support recovery guarantees of underdetermined sparse regression using the $\ell_1$-norm as a regularizer and a non-smooth loss function for data fidelity. More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss. While these losses are routinely used to account for either sparse ($\ell_1$ loss)…
▽ More
In this paper, we study the support recovery guarantees of underdetermined sparse regression using the $\ell_1$-norm as a regularizer and a non-smooth loss function for data fidelity. More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss. While these losses are routinely used to account for either sparse ($\ell_1$ loss) or uniform ($\ell_\infty$ loss) noise models, a theoretical analysis of their performance is still lacking. In this article, we extend the existing theory from the smooth $\ell_2$ case to these non-smooth cases. We derive a sharp condition which ensures that the support of the vector to recover is stable to small additive noise in the observations, as long as the loss constraint size is tuned proportionally to the noise level. A distinctive feature of our theory is that it also explains what happens when the support is unstable. While the support is not stable anymore, we identify an "extended support" and show that this extended support is stable to small additive noise. To exemplify the usefulness of our theory, we give a detailed numerical analysis of the support stability/instability of compressed sensing recovery with these different losses. This highlights different parameter regimes, ranging from total support stability to progressively increasing support instability.
△ Less
Submitted 3 November, 2016;
originally announced November 2016.
-
Proceedings of the third "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'16)
Authors:
V. Abrol,
O. Absil,
P. -A. Absil,
S. Anthoine,
P. Antoine,
T. Arildsen,
N. Bertin,
F. Bleichrodt,
J. Bobin,
A. Bol,
A. Bonnefoy,
F. Caltagirone,
V. Cambareri,
C. Chenot,
V. Crnojević,
M. Daňková,
K. Degraux,
J. Eisert,
J. M. Fadili,
M. Gabrié,
N. Gac,
D. Giacobello,
A. Gonzalez,
C. A. Gomez Gonzalez,
A. González
, et al. (36 additional authors not shown)
Abstract:
The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collab…
▽ More
The third edition of the "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) took place in Aalborg, the 4th largest city in Denmark situated beautifully in the northern part of the country, from the 24th to 26th of August 2016. The workshop venue was at the Aalborg University campus. One implicit objective of this biennial workshop is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For this third edition, iTWIST'16 gathered about 50 international participants and features 8 invited talks, 12 oral presentations, and 12 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing (e.g., optics, computer vision, genomics, biomedical, digital communication, channel estimation, astronomy); Application of sparse models in non-convex/non-linear inverse problems (e.g., phase retrieval, blind deconvolution, self calibration); Approximate probabilistic inference for sparse problems; Sparse machine learning and inference; "Blind" inverse problems and dictionary learning; Optimization for sparse modelling; Information theory, geometry and randomness; Sparsity? What's next? (Discrete-valued signals; Union of low-dimensional spaces, Cosparsity, mixed/group norm, model-based, low-complexity models, ...); Matrix/manifold sensing/processing (graph, low-rank approximation, ...); Complexity/accuracy tradeoffs in numerical methods/optimization; Electronic/optical compressive sensors (hardware).
△ Less
Submitted 14 September, 2016;
originally announced September 2016.
-
A Multi-step Inertial Forward--Backward Splitting Method for Non-convex Optimization
Authors:
**gwei Liang,
Jalal Fadili,
Gabriel Peyré
Abstract:
In this paper, we propose a multi-step inertial Forward--Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient. We first prove global convergence of the scheme with the help of the Kurdyka-Łojasiewicz property. Then, when the non-smooth part is…
▽ More
In this paper, we propose a multi-step inertial Forward--Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient. We first prove global convergence of the scheme with the help of the Kurdyka-Łojasiewicz property. Then, when the non-smooth part is also partly smooth relative to a smooth submanifold, we establish finite identification of the latter and provide sharp local linear convergence analysis. The proposed method is illustrated on a few problems arising from statistics and machine learning.
△ Less
Submitted 27 October, 2016; v1 submitted 7 June, 2016;
originally announced June 2016.
-
Local Convergence Properties of Douglas--Rachford and ADMM
Authors:
**gwei Liang,
Jalal Fadili,
Gabriel Peyré
Abstract:
The Douglas--Rachford (DR) and alternating direction method of multipliers (ADMM) are two proximal splitting algorithms designed to minimize the sum of two proper lower semi-continuous convex functions whose proximity operators are easy to compute. The goal of this work is to understand the local linear convergence behaviour of DR/ADMM when the involved functions are moreover partly smooth. More p…
▽ More
The Douglas--Rachford (DR) and alternating direction method of multipliers (ADMM) are two proximal splitting algorithms designed to minimize the sum of two proper lower semi-continuous convex functions whose proximity operators are easy to compute. The goal of this work is to understand the local linear convergence behaviour of DR/ADMM when the involved functions are moreover partly smooth. More precisely, when the two functions are partly smooth relative to their respective smooth submanifolds, we show that DR/ADMM (i) identifies these manifolds in finite time; (ii) enters a local linear convergence regime. When both functions are locally polyhedral, we show that the optimal convergence radius is given in terms of the cosine of the Friedrichs angle between the tangent spaces of the identified submanifolds. Under polyhedrality of both functions, we also provide condition sufficient for finite convergence of DR. The obtained results are illustrated by several concrete examples and supported by numerical experiments.
△ Less
Submitted 6 March, 2017; v1 submitted 7 June, 2016;
originally announced June 2016.
-
Activity Identification and Local Linear Convergence of Forward--Backward-type methods
Authors:
**gwei Liang,
Jalal Fadili,
Gabriel Peyré
Abstract:
In this paper, we consider a class of Forward--Backward (FB) splitting methods that includes several variants (e.g. inertial schemes, FISTA) for minimizing the sum of two proper convex and lower semi-continuous functions, one of which has a Lipschitz continuous gradient, and the other is partly smooth relatively to a smooth active manifold $\mathcal{M}$. We propose a unified framework, under which…
▽ More
In this paper, we consider a class of Forward--Backward (FB) splitting methods that includes several variants (e.g. inertial schemes, FISTA) for minimizing the sum of two proper convex and lower semi-continuous functions, one of which has a Lipschitz continuous gradient, and the other is partly smooth relatively to a smooth active manifold $\mathcal{M}$. We propose a unified framework, under which we show that, this class of FB-type algorithms (i) correctly identifies the active manifolds in a finite number of iterations (finite activity identification), and (ii) then enters a local linear convergence regime, which we characterize precisely in terms of the structure of the underlying active manifolds. For simpler problems involving polyhedral functions, we show finite termination. We also establish and explain why FISTA (with convergent sequences) locally oscillates and can be slower than FB. These results may have numerous applications including in signal/image processing, sparse recovery and machine learning. Indeed, the obtained results explain the typical behaviour that has been observed numerically for many problems in these fields such as the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name only a few.
△ Less
Submitted 3 January, 2018; v1 submitted 12 March, 2015;
originally announced March 2015.
-
Activity Identification and Local Linear Convergence of Douglas--Rachford/ADMM under Partial Smoothness
Authors:
**gwei Liang,
Jalal Fadili,
Gabriel Peyré,
Russell Luke
Abstract:
Convex optimization has become ubiquitous in most quantitative disciplines of science, including variational image processing. Proximal splitting algorithms are becoming popular to solve such structured convex optimization problems. Within this class of algorithms, Douglas--Rachford (DR) and alternating direction method of multipliers (ADMM) are designed to minimize the sum of two proper lower sem…
▽ More
Convex optimization has become ubiquitous in most quantitative disciplines of science, including variational image processing. Proximal splitting algorithms are becoming popular to solve such structured convex optimization problems. Within this class of algorithms, Douglas--Rachford (DR) and alternating direction method of multipliers (ADMM) are designed to minimize the sum of two proper lower semi-continuous convex functions whose proximity operators are easy to compute. The goal of this work is to understand the local convergence behaviour of DR (resp. ADMM) when the involved functions (resp. their Legendre-Fenchel conjugates) are moreover partly smooth. More precisely, when both of the two functions (resp. their conjugates) are partly smooth relative to their respective manifolds, we show that DR (resp. ADMM) identifies these manifolds in finite time. Moreover, when these manifolds are affine or linear, we prove that DR/ADMM is locally linearly convergent. When $J$ and $G$ are locally polyhedral, we show that the optimal convergence radius is given in terms of the cosine of the Friedrichs angle between the tangent spaces of the identified manifolds. This is illustrated by several concrete examples and supported by numerical experiments.
△ Less
Submitted 30 July, 2015; v1 submitted 21 December, 2014;
originally announced December 2014.
-
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
Authors:
L. Jacques,
C. De Vleeschouwer,
Y. Boursier,
P. Sudhakar,
C. De Mol,
A. Pizurica,
S. Anthoine,
P. Vandergheynst,
P. Frossard,
C. Bilen,
S. Kitic,
N. Bertin,
R. Gribonval,
N. Boumal,
B. Mishra,
P. -A. Absil,
R. Sepulchre,
S. Bundervoet,
C. Schretter,
A. Dooms,
P. Schelkens,
O. Chabiron,
F. Malgouyres,
J. -Y. Tourneret,
N. Dobigeon
, et al. (42 additional authors not shown)
Abstract:
The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in…
▽ More
The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.
△ Less
Submitted 9 October, 2014; v1 submitted 2 October, 2014;
originally announced October 2014.