-
Derivatives of Stochastic Gradient Descent
Authors:
Franck Iutzeler,
Edouard Pauwels,
Samuel Vaiter
Abstract:
We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the conv…
▽ More
We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution map** in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
One-step differentiation of iterative algorithms
Authors:
Jérôme Bolte,
Edouard Pauwels,
Samuel Vaiter
Abstract:
In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagatio…
▽ More
In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as performant as implicit differentiation for fast algorithms (e.g., superlinear optimization methods). We provide a complete theoretical approximation analysis with specific examples (Newton's method, gradient descent) along with its consequences in bilevel optimization. Several numerical examples illustrate the well-foundness of the one-step estimator.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Differentiating Nonsmooth Solutions to Parametric Monotone Inclusion Problems
Authors:
Jérôme Bolte,
Edouard Pauwels,
Antonio Silveti-Falls
Abstract:
We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient. A direct consequence of our result is that these solutions happen to be differentiable almost everywhere. Our approach is full…
▽ More
We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient. A direct consequence of our result is that these solutions happen to be differentiable almost everywhere. Our approach is fully compatible with automatic differentiation and comes with assumptions which are easy to check, roughly speaking: semialgebraicity and strong monotonicity. We illustrate the scope of our results by considering three fundamental composite problem settings: strongly convex problems, dual solutions to convex minimization problems and primal-dual solutions to min-max problems.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
On the complexity of nonsmooth automatic differentiation
Authors:
Jérôme Bolte,
Ryan Boustany,
Edouard Pauwels,
Béatrice Pesquet-Popescu
Abstract:
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This co…
▽ More
Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. The overhead complexity of the backward mode turns out to be independent of the dimension when using programs with locally Lipschitz semi-algebraic or definable elementary functions. This considerably extends Baur-Strassen's smooth cheap gradient principle. We illustrate our results by establishing fast backpropagation results of conservative gradients through feedforward neural networks with standard activation and loss functions. Nonsmooth backpropagation's cheapness contrasts with concurrent forward approaches, which have, to this day, dimensional-dependent worst-case overhead estimates. We provide further results suggesting the superiority of backward propagation of conservative gradients. Indeed, we relate the complexity of computing a large number of directional derivatives to that of matrix multiplication, and we show that finding two subgradients in the Clarke subdifferential of a function is an NP-hard problem.
△ Less
Submitted 6 February, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Automatic differentiation of nonsmooth iterative algorithms
Authors:
Jérôme Bolte,
Edouard Pauwels,
Samuel Vaiter
Abstract:
Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and c…
▽ More
Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Path differentiability of ODE flows
Authors:
Swann Marx,
Edouard Pauwels
Abstract:
We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields. Path differentiable functions constitute a proper subclass of Lipschitz functions which admit conservative gradients, a notion of generalized derivative compatible with basic calculus rules. Our main result states that such flows inherit the path differentiability property of the driving vector…
▽ More
We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields. Path differentiable functions constitute a proper subclass of Lipschitz functions which admit conservative gradients, a notion of generalized derivative compatible with basic calculus rules. Our main result states that such flows inherit the path differentiability property of the driving vector field. We show indeed that forward propagation of derivatives given by the sensitivity differential inclusions provide a conservative Jacobian for the flow. This allows to propose a nonsmooth version of the adjoint method, which can be applied to integral costs under an ODE constraint. This result constitutes a theoretical ground to the application of small step first order methods to solve a broad class of nonsmooth optimization problems with parametrized ODE constraints. This is illustrated with the convergence of small step first order methods based on the proposed nonsmooth adjoint.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
Regularisation for PCA- and SVD-type matrix factorisations
Authors:
Abdolrahman Khoshrou,
Eric J. Pauwels
Abstract:
Singular Value Decomposition (SVD) and its close relative, Principal Component Analysis (PCA), are well-known linear matrix decomposition techniques that are widely used in applications such as dimension reduction and clustering. However, an important limitation of SVD/PCA is its sensitivity to noise in the input data. In this paper, we take another look at the problem of regularisation and show t…
▽ More
Singular Value Decomposition (SVD) and its close relative, Principal Component Analysis (PCA), are well-known linear matrix decomposition techniques that are widely used in applications such as dimension reduction and clustering. However, an important limitation of SVD/PCA is its sensitivity to noise in the input data. In this paper, we take another look at the problem of regularisation and show that different formulations of the minimisation problem lead to qualitatively different solutions.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Numerical influence of ReLU'(0) on backpropagation
Authors:
David Bertoin,
Jérôme Bolte,
Sébastien Gerchinovitz,
Edouard Pauwels
Abstract:
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (f…
▽ More
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU'(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU'(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.
△ Less
Submitted 3 November, 2023; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Nonsmooth Implicit Differentiation for Machine Learning and Optimization
Authors:
Jérôme Bolte,
Tam Le,
Edouard Pauwels,
Antonio Silveti-Falls
Abstract:
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clar…
▽ More
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
△ Less
Submitted 5 April, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Second-order step-size tuning of SGD for non-convex optimization
Authors:
Camille Castera,
Jérôme Bolte,
Cédric Févotte,
Edouard Pauwels
Abstract:
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version o…
▽ More
In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.
△ Less
Submitted 21 November, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Sequential convergence of AdaGrad algorithm for smooth convex optimization
Authors:
Cheik Traoré,
Edouard Pauwels
Abstract:
We prove that the iterates produced by, either the scalar step size variant, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient. The key insight is to remark that such AdaGrad sequences satisfy a variable metric quasi-Fejér monotonicity property, which allows to prove convergence.
We prove that the iterates produced by, either the scalar step size variant, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient. The key insight is to remark that such AdaGrad sequences satisfy a variable metric quasi-Fejér monotonicity property, which allows to prove convergence.
△ Less
Submitted 13 April, 2021; v1 submitted 24 November, 2020;
originally announced November 2020.
-
A Hölderian backtracking method for min-max and min-min problems
Authors:
Jérôme Bolte,
Lilian Glaudin,
Edouard Pauwels,
Mathieu Serrurier
Abstract:
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptatio…
▽ More
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple GAN problems obtaining promising numerical results.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Incremental Without Replacement Sampling in Nonconvex Optimization
Authors:
Edouard Pauwels
Abstract:
Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are incremental: they rely on sampling without replacement, for which available analysis are much scarcer. We provide convergence guaranties for the latter variant by analys…
▽ More
Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are incremental: they rely on sampling without replacement, for which available analysis are much scarcer. We provide convergence guaranties for the latter variant by analysing a versatile incremental gradient scheme. For this scheme, we consider constant, decreasing or adaptive step sizes. In the smooth setting we obtain explicit complexity estimates in terms of epoch counter. In the nonsmooth setting we prove that the sequence is attracted by solutions of optimality conditions of the problem.
△ Less
Submitted 6 January, 2023; v1 submitted 15 July, 2020;
originally announced July 2020.
-
A mathematical model for automatic differentiation in machine learning
Authors:
Jerome Bolte,
Edouard Pauwels
Abstract:
Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochas…
▽ More
Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochastic approximation methods. We also evidence the issue of artificial critical points created by algorithmic differentiation and show how usual methods avoid these points with probability one.
△ Less
Submitted 29 October, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Semialgebraic Optimization for Lipschitz Constants of ReLU Networks
Authors:
Tong Chen,
Jean-Bernard Lasserre,
Victor Magron,
Edouard Pauwels
Abstract:
The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network. We introduce a semidefinite programming hierarchy to estimate the global and local Lipschitz constant of a multiple layer deep neural network. The novelty is to combine a polynomial lifting for ReLU functions derivatives…
▽ More
The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network. We introduce a semidefinite programming hierarchy to estimate the global and local Lipschitz constant of a multiple layer deep neural network. The novelty is to combine a polynomial lifting for ReLU functions derivatives with a weak generalization of Putinar's positivity certificate. This idea could also apply to other, nearly sparse, polynomial optimization problems in machine learning. We empirically demonstrate that our method provides a trade-off with respect to state of the art linear programming approach, and in some cases we obtain better bounds in less time.
△ Less
Submitted 28 October, 2020; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Rate of convergence for geometric inference based on the empirical Christoffel function
Authors:
Mai Trang Vu,
François Bachoc,
Edouard Pauwels
Abstract:
We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parame…
▽ More
We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parameters as a function of the sample size. We provide a convergence rate analysis of the resulting support estimation procedure. Our analysis establishes that we may obtain finite sample bounds which are comparable to existing rates for different set estimation procedures. Our results rely on concentration inequalities for the empirical Christoffel function and on estimates of the supremum of the Christoffel-Darboux kernel on sets with smooth boundaries, that can be considered of independent interest.
△ Less
Submitted 19 May, 2020; v1 submitted 31 October, 2019;
originally announced October 2019.
-
Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning
Authors:
Jérôme Bolte,
Edouard Pauwels
Abstract:
Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path d…
▽ More
Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path differentiable. Using Whitney stratification techniques for semialgebraic and definable sets, our model provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Our differential model is applied to establish the convergence in values of nonsmooth stochastic gradient methods as they are implemented in practice.
△ Less
Submitted 9 April, 2020; v1 submitted 23 September, 2019;
originally announced September 2019.
-
An Inertial Newton Algorithm for Deep Learning
Authors:
Camille Castera,
Jérôme Bolte,
Cédric Févotte,
Edouard Pauwels
Abstract:
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both…
▽ More
We introduce a new second-order inertial optimization method for machine learning called INNA. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. This makes INNA fully implementable and adapted to large-scale optimization problems such as the training of deep neural networks. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of INNA for most deep learning problems. To do so, we provide a well-suited framework to analyze deep learning loss functions involving tame optimization in which we study a continuous dynamical system together with its discrete stochastic approximations. We prove sublinear convergence for the continuous-time differential inclusion which underlies our algorithm. Additionally, we also show how standard optimization mini-batch methods applied to non-smooth non-convex problems can yield a certain type of spurious stationary points never discussed before. We address this issue by providing a theoretical framework around the new idea of $D$-criticality; we then give a simple asymptotic analysis of INNA. Our algorithm allows for using an aggressive learning rate of $o(1/\log k)$. From an empirical viewpoint, we show that INNA returns competitive results with respect to state of the art (stochastic gradient descent, ADAGRAD, ADAM) on popular deep learning benchmark problems.
△ Less
Submitted 28 July, 2021; v1 submitted 29 May, 2019;
originally announced May 2019.
-
SVD-based Visualisation and Approximation for Time Series Data in Smart Energy Systems
Authors:
Abdolrahman Khoshrou,
Andre B. Dorsman,
Eric. J. Pauwels
Abstract:
Many time series in smart energy systems exhibit two different timescales. On the one hand there are patterns linked to daily human activities. On the other hand, there are relatively slow trends linked to seasonal variations. In this paper we interpret these time series as matrices, to be visualized as images. This approach has two advantages: First of all, interpreting such time series as images…
▽ More
Many time series in smart energy systems exhibit two different timescales. On the one hand there are patterns linked to daily human activities. On the other hand, there are relatively slow trends linked to seasonal variations. In this paper we interpret these time series as matrices, to be visualized as images. This approach has two advantages: First of all, interpreting such time series as images enables one to visually integrate across the image and makes it therefore easier to spot subtle or faint features. Second, the matrix interpretation also grants elucidation of the underlying structure using well-established matrix decomposition methods. We will illustrate both these aspects for data obtained from the German day-ahead market.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
Quantifying Volatility Reduction in German Day-ahead Spot Market in the Period 2006 through 2016
Authors:
Abdolrahman Khoshrou,
Eric J. Pauwels
Abstract:
In Europe, Germany is taking the lead in the switch from the conventional to renewable energy. This poses new challenges as wind and solar energy are fundamentally intermittent, weather-dependent and less predictable. It is therefore of considerable interest to investigate the evolution of price volatility in this post-transition era. There are a number of reasons, however, that makes the practica…
▽ More
In Europe, Germany is taking the lead in the switch from the conventional to renewable energy. This poses new challenges as wind and solar energy are fundamentally intermittent, weather-dependent and less predictable. It is therefore of considerable interest to investigate the evolution of price volatility in this post-transition era. There are a number of reasons, however, that makes the practical studies difficult. For instance, EPEX prices can be zero or negative. Consequently, the standard approach in financial time series analysis to switch to logarithmic measures is inapplicable. Furthermore, in contrast to the stock market prices which are only available for trading days, EPEX prices cover the whole year, including weekends and holidays. Accordingly, there is a lot of underlying variability in the data which has nothing to do with volatility, but simply reflects diurnal activity patterns. An important distinction of the present work is the application of matrix decomposition techniques, namely the singular value decomposition (SVD), for defining an alternative notion of volatility. This approach is systematically more robust toward outliers and also the diurnal patterns. Our observations show that the day-ahead market is becoming less volatile in recent years.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Relating Leverage Scores and Density using Regularized Christoffel Functions
Authors:
Edouard Pauwels,
Francis Bach,
Jean-Philippe Vert
Abstract:
Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature. Yet, the very nature of this quantity is barely understood. Borrowing ideas from the orthogonal polynomial literature, we introduce the regularized Christoffel function associated to a positive definite k…
▽ More
Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature. Yet, the very nature of this quantity is barely understood. Borrowing ideas from the orthogonal polynomial literature, we introduce the regularized Christoffel function associated to a positive definite kernel. This uncovers a variational formulation for leverage scores for kernel methods and allows to elucidate their relationships with the chosen kernel as well as population density. Our main result quantitatively describes a decreasing relation between leverage score and population density for a broad class of kernels on Euclidean spaces. Numerical simulations support our findings.
△ Less
Submitted 21 November, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.
-
On Fienup Methods for Regularized Phase Retrieval
Authors:
Edouard Pauwels,
Amir Beck,
Yonina C. Eldar,
Shoham Sabach
Abstract:
Alternating minimization, or Fienup methods, have a long history in phase retrieval. We provide new insights related to the empirical and theoretical analysis of these algorithms when used with Fourier measurements and combined with convex priors. In particular, we show that Fienup methods can be viewed as performing alternating minimization on a regularized nonconvex least-squares problem with re…
▽ More
Alternating minimization, or Fienup methods, have a long history in phase retrieval. We provide new insights related to the empirical and theoretical analysis of these algorithms when used with Fourier measurements and combined with convex priors. In particular, we show that Fienup methods can be viewed as performing alternating minimization on a regularized nonconvex least-squares problem with respect to amplitude measurements. We then prove that under mild additional structural assumptions on the prior (semi-algebraicity), the sequence of signal estimates has a smooth convergent behaviour towards a critical point of the nonconvex regularized least-squares objective. Finally, we propose an extension to Fienup techniques, based on a projected gradient descent interpretation and acceleration using inertial terms. We demonstrate experimentally that this modification combined with an $\ell_1$ prior constitutes a competitive approach for sparse phase retrieval.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
The empirical Christoffel function with applications in data analysis
Authors:
Jean-Bernard Lasserre,
Edouard Pauwels
Abstract:
We illustrate the potential applications in machine learning of the Christoffel function, or more precisely, its empirical counterpart associated with a counting measure uniformly supported on a finite set of points. Firstly, we provide a thresholding scheme which allows to approximate the support of a measure from a finite subset of its moments with strong asymptotic guaranties. Secondly, we prov…
▽ More
We illustrate the potential applications in machine learning of the Christoffel function, or more precisely, its empirical counterpart associated with a counting measure uniformly supported on a finite set of points. Firstly, we provide a thresholding scheme which allows to approximate the support of a measure from a finite subset of its moments with strong asymptotic guaranties. Secondly, we provide a consistency result which relates the empirical Christoffel function and its population counterpart in the limit of large samples. Finally, we illustrate the relevance of our results on simulated and real world datasets for several applications in statistics and machine learning: (a) density and support estimation from finite samples, (b) outlier and novelty detection and (c) affine matching.
△ Less
Submitted 7 February, 2019; v1 submitted 11 January, 2017;
originally announced January 2017.
-
Sorting out typicality with the inverse moment matrix SOS polynomial
Authors:
Jean-Bernard Lasserre,
Edouard Pauwels
Abstract:
We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) derived in a simple…
▽ More
We study a surprising phenomenon related to the representation of a cloud of data points using polynomials. We start with the previously unnoticed empirical observation that, given a collection (a cloud) of data points, the sublevel sets of a certain distinguished polynomial capture the shape of the cloud very accurately. This distinguished polynomial is a sum-of-squares (SOS) derived in a simple manner from the inverse of the empirical moment matrix. In fact, this SOS polynomial is directly related to orthogonal polynomials and the Christoffel function. This allows to generalize and interpret extremality properties of orthogonal polynomials and to provide a mathematical rationale for the observed phenomenon. Among diverse potential applications, we illustrate the relevance of our results on a network intrusion detection task for which we obtain performances similar to existing dedicated methods reported in the literature.
△ Less
Submitted 14 June, 2016; v1 submitted 13 June, 2016;
originally announced June 2016.
-
Using MathML to Represent Units of Measurement for Improved Ontology Alignment
Authors:
Chau Do,
Eric J. Pauwels
Abstract:
Ontologies provide a formal description of concepts and their relationships in a knowledge domain. The goal of ontology alignment is to identify semantically matching concepts and relationships across independently developed ontologies that purport to describe the same knowledge. In order to handle the widest possible class of ontologies, many alignment algorithms rely on terminological and struct…
▽ More
Ontologies provide a formal description of concepts and their relationships in a knowledge domain. The goal of ontology alignment is to identify semantically matching concepts and relationships across independently developed ontologies that purport to describe the same knowledge. In order to handle the widest possible class of ontologies, many alignment algorithms rely on terminological and structural meth- ods, but the often fuzzy nature of concepts complicates the matching process. However, one area that should provide clear matching solutions due to its mathematical nature, is units of measurement. Several on- tologies for units of measurement are available, but there has been no attempt to align them, notwithstanding the obvious importance for tech- nical interoperability. We propose a general strategy to map these (and similar) ontologies by introducing MathML to accurately capture the semantic description of concepts specified therein. We provide map** results for three ontologies, and show that our approach improves on lexical comparisons.
△ Less
Submitted 5 July, 2013;
originally announced July 2013.