-
Robust Stochastic Optimization via Gradient Quantile Clip**
Authors:
Ibrahim Merad,
Stéphane Gaïffas
Abstract:
We introduce a clip** strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clip** thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber cont…
▽ More
We introduce a clip** strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clip** thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber contamination. Our mathematical analysis leverages the connection between constant step size SGD and Markov chains and handles the bias introduced by clip** in an original way. For strongly convex objectives, we prove that the iteration converges to a concentrated distribution and derive high probability bounds on the final estimation error. In the non-convex case, we prove that the limit distribution is localized on a neighborhood with low gradient. We propose an implementation of this algorithm using rolling quantiles which leads to a highly efficient optimization procedure with strong robustness properties, as confirmed by our numerical experiments.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization
Authors:
Massil Hihat,
Stéphane Gaïffas,
Guillaume Garrigos,
Simon Bussy
Abstract:
We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses. Our motivation is to consider general demands, losses and dynamics to go beyond standard models which usually rely on newsvendor-type losses, fixed dynamics, and unrealistic i.i.d. demand assumptions. We propo…
▽ More
We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses. Our motivation is to consider general demands, losses and dynamics to go beyond standard models which usually rely on newsvendor-type losses, fixed dynamics, and unrealistic i.i.d. demand assumptions. We propose MaxCOSD, an online algorithm that has provable guarantees even for problems with non-i.i.d. demands and stateful dynamics, including for instance perishability. We consider what we call non-degeneracy assumptions on the demand process, and argue that they are necessary to allow learning.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Convergence and concentration properties of constant step-size SGD through Markov chains
Authors:
Ibrahim Merad,
Stéphane Gaïffas
Abstract:
We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstei…
▽ More
We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.
△ Less
Submitted 4 July, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Robust Methods for High-Dimensional Linear Learning
Authors:
Ibrahim Merad,
Stéphane Gaïffas
Abstract:
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features $d$ may exceed the sample size $n$. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vani…
▽ More
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features $d$ may exceed the sample size $n$. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. This leads, for each application, to efficient and robust learning algorithms, that reach near-optimal estimation rates under heavy-tailed distributions and the presence of outliers. For vanilla $s$-sparsity, we are able to reach the $s\log (d)/n$ rate under heavy-tails and $η$-corruption, at a computational cost comparable to that of non-robust analogs. We provide an efficient implementation of our algorithms in an open-source $\mathtt{Python}$ library called $\mathtt{linlearn}$, by means of which we carry out numerical experiments which confirm our theoretical findings together with a comparison to other recent approaches proposed in the literature.
△ Less
Submitted 29 May, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Robust supervised learning with coordinate gradient descent
Authors:
Stéphane Gaïffas,
Ibrahim Merad
Abstract:
This paper considers the problem of supervised learning with linear methods when both features and labels can be corrupted, either in the form of heavy tailed data and/or corrupted rows. We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives. This leads to robust statistical learning methods that have a numerical…
▽ More
This paper considers the problem of supervised learning with linear methods when both features and labels can be corrupted, either in the form of heavy tailed data and/or corrupted rows. We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives. This leads to robust statistical learning methods that have a numerical complexity nearly identical to non-robust ones based on empirical risk minimization. The main idea is simple: while robust learning with gradient descent requires the computational cost of robustly estimating the whole gradient to update all parameters, a parameter can be updated immediately using a robust estimator of a single partial derivative in coordinate gradient descent. We prove upper bounds on the generalization error of the algorithms derived from this idea, that control both the optimization and statistical errors with and without a strong convexity assumption of the risk. Finally, we propose an efficient implementation of this approach in a new python library called linlearn, and demonstrate through extensive numerical experiments that our approach introduces a new interesting compromise between robustness, statistical performance and numerical efficiency for this problem.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
WildWood: a new Random Forest algorithm
Authors:
Stéphane Gaïffas,
Ibrahim Merad,
Yiyang Yu
Abstract:
We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with expo…
▽ More
We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.
△ Less
Submitted 13 June, 2023; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Predicting the Solar Potential of Rooftops using Image Segmentation and Structured Data
Authors:
Daniel de Barros Soares,
François Andrieux,
Bastien Hell,
Julien Lenhardt,
Jordi Badosa,
Sylvain Gavoille,
Stéphane Gaiffas,
Emmanuel Bacry
Abstract:
Estimating the amount of electricity that can be produced by rooftop photovoltaic systems is a time-consuming process that requires on-site measurements, a difficult task to achieve on a large scale. In this paper, we present an approach to estimate the solar potential of rooftops based on their location and architectural characteristics, as well as the amount of solar radiation they receive annua…
▽ More
Estimating the amount of electricity that can be produced by rooftop photovoltaic systems is a time-consuming process that requires on-site measurements, a difficult task to achieve on a large scale. In this paper, we present an approach to estimate the solar potential of rooftops based on their location and architectural characteristics, as well as the amount of solar radiation they receive annually. Our technique uses computer vision to achieve semantic segmentation of roof sections and roof objects on the one hand, and a machine learning model based on structured building features to predict roof pitch on the other hand. We then compute the azimuth and maximum number of solar panels that can be installed on a rooftop with geometric approaches. Finally, we compute precise shading masks and combine them with solar irradiation data that enables us to estimate the yearly solar potential of a rooftop.
△ Less
Submitted 28 May, 2021;
originally announced June 2021.
-
About contrastive unsupervised representation learning for classification and its convergence
Authors:
Ibrahim Merad,
Yiyang Yu,
Emmanuel Bacry,
Stéphane Gaïffas
Abstract:
Contrastive representation learning has been recently proved to be very efficient for self-supervised training. These methods have been successfully used to train encoders which perform comparably to supervised training on downstream classification tasks. A few works have started to build a theoretical framework around contrastive learning in which guarantees for its performance can be proven. We…
▽ More
Contrastive representation learning has been recently proved to be very efficient for self-supervised training. These methods have been successfully used to train encoders which perform comparably to supervised training on downstream classification tasks. A few works have started to build a theoretical framework around contrastive learning in which guarantees for its performance can be proven. We provide extensions of these results to training with multiple negative samples and for multiway classification. Furthermore, we provide convergence guarantees for the minimization of the contrastive training error with gradient descent of an overparametrized deep neural encoder, and provide some numerical experiments that complement our theoretical findings
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
An improper estimator with optimal excess risk in misspecified density estimation and logistic regression
Authors:
Jaouad Mourtada,
Stéphane Gaïffas
Abstract:
We introduce a procedure for conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for statistical learning. On standard examples, this bound scales as $d/n$ with $d$ the model dimension and $n$ the sample size, and critically remains valid under model misspecification. Being an improper (out-of-…
▽ More
We introduce a procedure for conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for statistical learning. On standard examples, this bound scales as $d/n$ with $d$ the model dimension and $n$ the sample size, and critically remains valid under model misspecification. Being an improper (out-of-model) procedure, SMP improves over within-model estimators such as the maximum likelihood estimator, whose excess risk degrades under misspecification. Compared to approaches reducing to the sequential problem, our bounds remove suboptimal $\log n$ factors and can handle unbounded classes. For the Gaussian linear model, the predictions and risk bound of SMP are governed by leverage scores of covariates, nearly matching the optimal risk in the well-specified case without conditions on the noise variance or approximation error of the linear model. For logistic regression, SMP provides a non-Bayesian approach to calibration of probabilistic predictions relying on virtual samples, and can be computed by solving two logistic regressions. It achieves a non-asymptotic excess risk of $O((d + B^2R^2)/n)$, where $R$ bounds the norm of features and $B$ that of the comparison parameter; by contrast, no within-model estimator can achieve better rate than $\min({B R}/{\sqrt{n}}, {d e^{BR}}/{n} )$ in general. This provides a more practical alternative to Bayesian approaches, which require approximate posterior sampling, thereby partly addressing a question raised by Foster et al. (2018).
△ Less
Submitted 8 December, 2021; v1 submitted 23 December, 2019;
originally announced December 2019.
-
ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data
Authors:
Anastasiia Kabeshova,
Yiyang Yu,
Bertrand Lukacs,
Emmanuel Bacry,
Stéphane Gaïffas
Abstract:
This paper considers the problems of modeling and predicting a long-term and ``blurry'' relapse that occurs after a medical act, such as a surgery. The relapse is observed only indirectly, in a ``blurry'' fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) i…
▽ More
This paper considers the problems of modeling and predicting a long-term and ``blurry'' relapse that occurs after a medical act, such as a surgery. The relapse is observed only indirectly, in a ``blurry'' fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a ``non-clinical'' claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost \emph{all French citizens} who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.
△ Less
Submitted 25 July, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
SCALPEL3: a scalable open-source library for healthcare claims databases
Authors:
Emmanuel Bacry,
Stéphane Gaïffas,
Fanny Leroy,
Maryan Morel,
Dinh Phong Nguyen,
Youcef Sebiat,
Dian Sun
Abstract:
This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppi…
▽ More
This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppin et al. (2017)), a huge healthcare claims database that handles the reimbursement of almost all French citizens.
SCALPEL3 focuses on scalability, easy interactive analysis and helpers for data flow analysis to accelerate studies performed on LODs. It consists of three open-source libraries based on Apache Spark. SCALPEL-Flattening allows denormalization of the LOD (only SNDS for now) by joining tables sequentially in a big table. SCALPEL-Extraction provides fast concept extraction from a big table such as the one produced by SCALPEL-Flattening. Finally, SCALPEL-Analysis allows interactive cohort manipulations, monitoring statistics of cohort flows and building datasets to be used with machine learning libraries. The first two provide a Scala API while the last one provides a Python API that can be used in an interactive environment. Our code is available on GitHub.
SCALPEL3 allowed to extract successfully complex concepts for studies such as Morel et al (2017) or studies with 14.5 million patients observed over three years (corresponding to more than 15 billion healthcare events and roughly 15 TeraBytes of data) in less than 49 minutes on a small 15 nodes HDFS cluster. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs.
△ Less
Submitted 26 August, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
AMF: Aggregated Mondrian Forests for Online Learning
Authors:
Jaouad Mourtada,
Stéphane Gaïffas,
Erwan Scornet
Abstract:
Random Forests (RF) is one of the algorithms of choice in many supervised learning applications, be it classification or regression. The appeal of such tree-ensemble methods comes from a combination of several characteristics: a remarkable accuracy in a variety of tasks, a small number of parameters to tune, robustness with respect to features scaling, a reasonable computational cost for training…
▽ More
Random Forests (RF) is one of the algorithms of choice in many supervised learning applications, be it classification or regression. The appeal of such tree-ensemble methods comes from a combination of several characteristics: a remarkable accuracy in a variety of tasks, a small number of parameters to tune, robustness with respect to features scaling, a reasonable computational cost for training and prediction, and their suitability in high-dimensional settings. The most commonly used RF variants however are "offline" algorithms, which require the availability of the whole dataset at once. In this paper, we introduce AMF, an online random forest algorithm based on Mondrian Forests. Using a variant of the Context Tree Weighting algorithm, we show that it is possible to efficiently perform an exact aggregation over all prunings of the trees; in particular, this enables to obtain a truly online parameter-free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function. Numerical experiments show that AMF is competitive with respect to several strong baselines on a large number of datasets for multi-class classification.
△ Less
Submitted 15 May, 2020; v1 submitted 25 June, 2019;
originally announced June 2019.
-
On the optimality of the Hedge algorithm in the stochastic regime
Authors:
Jaouad Mourtada,
Stéphane Gaïffas
Abstract:
In this paper, we study the behavior of the Hedge algorithm in the online stochastic setting. We prove that anytime Hedge with decreasing learning rate, which is one of the simplest algorithm for the problem of prediction with expert advice, is surprisingly both worst-case optimal and adaptive to the easier stochastic and adversarial with a gap problems. This shows that, in spite of its small, non…
▽ More
In this paper, we study the behavior of the Hedge algorithm in the online stochastic setting. We prove that anytime Hedge with decreasing learning rate, which is one of the simplest algorithm for the problem of prediction with expert advice, is surprisingly both worst-case optimal and adaptive to the easier stochastic and adversarial with a gap problems. This shows that, in spite of its small, non-adaptive learning rate, Hedge possesses the same optimal regret guarantee in the stochastic case as recently introduced adaptive algorithms. Moreover, our analysis exhibits qualitative differences with other variants of the Hedge algorithm, such as the fixed-horizon version (with constant learning rate) and the one based on the so-called "doubling trick", both of which fail to adapt to the easier stochastic setting. Finally, we discuss the limitations of anytime Hedge and the improvements provided by second-order regret bounds in the stochastic case.
△ Less
Submitted 8 July, 2019; v1 submitted 5 September, 2018;
originally announced September 2018.
-
Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework
Authors:
Simon Bussy,
Raphaël Veil,
Vincent Looten,
Anita Burgun,
Stéphane Gaïffas,
Agathe Guilloux,
Brigitte Ranque,
Anne-Sophie Jannot
Abstract:
Background: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a bina…
▽ More
Background: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (based on an arbitrarily chosen delay) or within a survival setting, but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. Methods: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We then compare performances of all methods both in terms of risk prediction and variable selection, with a focus on the use of Elastic-Net regularization technique. Results: Among all assessed statistical methods assessed, the C-mix model yields the better performances in both the two considered settings, as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. Conclusions: It appears that learning withing the survival setting first, and then going back to a binary prediction using the survival estimates significantly enhance binary predictions.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Dual optimization for convex constrained objectives without the gradient-Lipschitz assumption
Authors:
Martin Bompaire,
Emmanuel Bacry,
Stéphane Gaïffas
Abstract:
The minimization of convex objectives coming from linear supervised learning problems, such as penalized generalized linear models, can be formulated as finite sums of convex functions. For such problems, a large set of stochastic first-order solvers based on the idea of variance reduction are available and combine both computational efficiency and sound theoretical guarantees (linear convergence…
▽ More
The minimization of convex objectives coming from linear supervised learning problems, such as penalized generalized linear models, can be formulated as finite sums of convex functions. For such problems, a large set of stochastic first-order solvers based on the idea of variance reduction are available and combine both computational efficiency and sound theoretical guarantees (linear convergence rates). Such rates are obtained under both gradient-Lipschitz and strong convexity assumptions. Motivated by learning problems that do not meet the gradient-Lipschitz assumption, such as linear Poisson regression, we work under another smoothness assumption, and obtain a linear convergence rate for a shifted version of Stochastic Dual Coordinate Ascent (SDCA) that improves the current state-of-the-art. Our motivation for considering a solver working on the Fenchel-dual problem comes from the fact that such objectives include many linear constraints, that are easier to deal with in the dual. Our approach and theoretical findings are validated on several datasets, for Poisson regression and another objective coming from the negative log-likelihood of the Hawkes process, which is a family of models which proves extremely useful for the modeling of information propagation in social networks and causality inference.
△ Less
Submitted 15 December, 2018; v1 submitted 10 July, 2018;
originally announced July 2018.
-
Uncovering Causality from Multivariate Hawkes Integrated Cumulants
Authors:
Massil Achab,
Emmanuel Bacry,
Stéphane Gaïffas,
Iacopo Mastromatteo,
Jean-Francois Muzy
Abstract:
We design a new nonparametric method that allows one to estimate the matrix of integrated kernels of a multivariate Hawkes process. This matrix not only encodes the mutual influences of each nodes of the process, but also disentangles the causality relationships between them. Our approach is the first that leads to an estimation of this matrix without any parametric modeling and estimation of the…
▽ More
We design a new nonparametric method that allows one to estimate the matrix of integrated kernels of a multivariate Hawkes process. This matrix not only encodes the mutual influences of each nodes of the process, but also disentangles the causality relationships between them. Our approach is the first that leads to an estimation of this matrix without any parametric modeling and estimation of the kernels themselves. A consequence is that it can give an estimation of causality relationships between nodes (or users), based on their activity timestamps (on a social network for instance), without knowing or estimating the shape of the activities lifetime. For that purpose, we introduce a moment matching method that fits the third-order integrated cumulants of the process. We show on numerical experiments that our approach is indeed very robust to the shape of the kernels, and gives appealing results on the MemeTracker database.
△ Less
Submitted 29 May, 2017; v1 submitted 21 July, 2016;
originally announced July 2016.
-
Mean-field inference of Hawkes point processes
Authors:
Emmanuel Bacry,
Stéphane Gaïffas,
Iacopo Mastromatteo,
Jean-François Muzy
Abstract:
We propose a fast and efficient estimation method that is able to accurately recover the parameters of a d-dimensional Hawkes point-process from a set of observations. We exploit a mean-field approximation that is valid when the fluctuations of the stochastic intensity are small. We show that this is notably the case in situations when interactions are sufficiently weak, when the dimension of the…
▽ More
We propose a fast and efficient estimation method that is able to accurately recover the parameters of a d-dimensional Hawkes point-process from a set of observations. We exploit a mean-field approximation that is valid when the fluctuations of the stochastic intensity are small. We show that this is notably the case in situations when interactions are sufficiently weak, when the dimension of the system is high or when the fluctuations are self-averaging due to the large number of past events they involve. In such a regime the estimation of a Hawkes process can be mapped on a least-squares problem for which we provide an analytic solution. Though this estimator is biased, we show that its precision can be comparable to the one of the Maximum Likelihood Estimator while its computation speed is shown to be improved considerably. We give a theoretical control on the accuracy of our new approach and illustrate its efficiency using synthetic datasets, in order to assess the statistical estimation error of the parameters.
△ Less
Submitted 4 November, 2015;
originally announced November 2015.
-
SGD with Variance Reduction beyond Empirical Risk Minimization
Authors:
Massil Achab,
Agathe Guilloux,
Stéphane Gaïffas,
Emmanuel Bacry
Abstract:
We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The propos…
▽ More
We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model used in survival analysis), but our algorithm can be used in different settings as well. The proposed algorithm is doubly stochastic in the sense that gradient steps are done using stochastic gradient descent (SGD) with variance reduction, where the inner expectations are approximated by a Monte-Carlo Markov-Chain (MCMC) algorithm. We derive conditions on the MCMC number of iterations guaranteeing convergence, and obtain a linear rate of convergence under strong convexity and a sublinear rate without this assumption. We illustrate the fact that our algorithm improves the state-of-the-art solver for regularized Cox partial-likelihood on several datasets from survival analysis.
△ Less
Submitted 8 November, 2016; v1 submitted 16 October, 2015;
originally announced October 2015.
-
Weighted algorithms for compressed sensing and matrix completion
Authors:
Stéphane Gaïffas,
Guillaume Lecué
Abstract:
This paper is about iteratively reweighted basis-pursuit algorithms for compressed sensing and matrix completion problems. In a first part, we give a theoretical explanation of the fact that reweighted basis pursuit can improve a lot upon basis pursuit for exact recovery in compressed sensing. We exhibit a condition that links the accuracy of the weights to the RIP and incoherency constants, which…
▽ More
This paper is about iteratively reweighted basis-pursuit algorithms for compressed sensing and matrix completion problems. In a first part, we give a theoretical explanation of the fact that reweighted basis pursuit can improve a lot upon basis pursuit for exact recovery in compressed sensing. We exhibit a condition that links the accuracy of the weights to the RIP and incoherency constants, which ensures exact recovery. In a second part, we introduce a new algorithm for matrix completion, based on the idea of iterative reweighting. Since a weighted nuclear "norm" is typically non-convex, it cannot be used easily as an objective function. So, we define a new estimator based on a fixed-point equation. We give empirical evidences of the fact that this new algorithm leads to strong improvements over nuclear norm minimization on simulated and real matrix completion problems.
△ Less
Submitted 8 July, 2011;
originally announced July 2011.