-
Modeling Methane Intensity of Oil and Gas Upstream Activities by Production Profile
Authors:
Quentin Peyle,
Imene Ben Rejeb-Mzah,
Baptiste Piofret,
Antoine Benoit,
Alexandre d'Aspremont,
Adil El Yaalaoui
Abstract:
We propose a methodology for modelling methane intensities of Oil and Gas upstream activities for different production profiles with diverse combinations of region of operation and production volumes associated. This methodology leverages different data sources, including satellite measurements and public estimates of methane emissions but also country-level oil and gas production data and company…
▽ More
We propose a methodology for modelling methane intensities of Oil and Gas upstream activities for different production profiles with diverse combinations of region of operation and production volumes associated. This methodology leverages different data sources, including satellite measurements and public estimates of methane emissions but also country-level oil and gas production data and company reporting. The obtained methane intensity models are compared to the reference companies' own reporting in order to better understand methane emissions for different types of companies. The results show that regions of operation within the different production profiles have a significant impact on the value of modelled methane intensities, especially for operators located in a single or few countries, such as national and medium-sized international operators. This paper also shows that methane intensities reported by the companies tend to be on average 16.1 times smaller than that obtained using the methodology presented here, and cannot account for total methane emissions that are estimated for upstream operations in the different regions observed.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Vision Transformers, a new approach for high-resolution and large-scale map** of canopy heights
Authors:
Ibrahim Fayad,
Philippe Ciais,
Martin Schwartz,
Jean-Pierre Wigneron,
Nicolas Baghdadi,
Aurélien de Truchis,
Alexandre d'Aspremont,
Frederic Frappart,
Sassan Saatchi,
Agnes Pellissier-Tanon,
Hassan Bazzi
Abstract:
Accurate and timely monitoring of forest canopy heights is critical for assessing forest dynamics, biodiversity, carbon sequestration as well as forest degradation and deforestation. Recent advances in deep learning techniques, coupled with the vast amount of spaceborne remote sensing data offer an unprecedented opportunity to map canopy height at high spatial and temporal resolutions. Current tec…
▽ More
Accurate and timely monitoring of forest canopy heights is critical for assessing forest dynamics, biodiversity, carbon sequestration as well as forest degradation and deforestation. Recent advances in deep learning techniques, coupled with the vast amount of spaceborne remote sensing data offer an unprecedented opportunity to map canopy height at high spatial and temporal resolutions. Current techniques for wall-to-wall canopy height map** correlate remotely sensed 2D information from optical and radar sensors to the vertical structure of trees using LiDAR measurements. While studies using deep learning algorithms have shown promising performances for the accurate map** of canopy heights, they have limitations due to the type of architectures and loss functions employed. Moreover, map** canopy heights over tropical forests remains poorly studied, and the accurate height estimation of tall canopies is a challenge due to signal saturation from optical and radar sensors, persistent cloud covers and sometimes the limited penetration capabilities of LiDARs. Here, we map heights at 10 m resolution across the diverse landscape of Ghana with a new vision transformer (ViT) model optimized concurrently with a classification (discrete) and a regression (continuous) loss function. This model achieves better accuracy than previously used convolutional based approaches (ConvNets) optimized with only a continuous loss function. The ViT model results show that our proposed discrete/continuous loss significantly increases the sensitivity for very tall trees (i.e., > 35m), for which other approaches show saturation effects. The height maps generated by the ViT also have better ground sampling distance and better sensitivity to sparse vegetation in comparison to a convolutional model. Our ViT model has a RMSE of 3.12m in comparison to a reference dataset while the ConvNet model has a RMSE of 4.3m.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Detecting Methane Plumes using PRISMA: Deep Learning Model and Data Augmentation
Authors:
Alexis Groshenry,
Clement Giron,
Thomas Lauvaux,
Alexandre d'Aspremont,
Thibaud Ehret
Abstract:
The new generation of hyperspectral imagers, such as PRISMA, has improved significantly our detection capability of methane (CH4) plumes from space at high spatial resolution (30m). We present here a complete framework to identify CH4 plumes using images from the PRISMA satellite mission and a deep learning model able to detect plumes over large areas. To compensate for the relative scarcity of PR…
▽ More
The new generation of hyperspectral imagers, such as PRISMA, has improved significantly our detection capability of methane (CH4) plumes from space at high spatial resolution (30m). We present here a complete framework to identify CH4 plumes using images from the PRISMA satellite mission and a deep learning model able to detect plumes over large areas. To compensate for the relative scarcity of PRISMA images, we trained our model by transposing high resolution plumes from Sentinel-2 to PRISMA. Our methodology thus avoids computationally expensive synthetic plume generation from Large Eddy Simulations by generating a broad and realistic training database, and paves the way for large-scale detection of methane plumes using future hyperspectral sensors (EnMAP, EMIT, CarbonMapper).
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Optimal Algorithms for Stochastic Complementary Composite Minimization
Authors:
Alexandre d'Aspremont,
Cristóbal Guzmán,
Clément Lezane
Abstract:
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on c…
▽ More
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on closely related settings, prior to our work no complexity bounds for this problem were known. We close this gap by providing novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems. We conclude by providing numerical results comparing our methods to the state of the art.
△ Less
Submitted 23 January, 2024; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Linear Bandits on Uniformly Convex Sets
Authors:
Thomas Kerdreux,
Christophe Roux,
Alexandre d'Aspremont,
Sebastian Pokutta
Abstract:
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we der…
▽ More
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, §5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$. However, this comes at the expense of asymptotic rates in $T$ varying between $\tilde{\mathcal{O}}(\sqrt{T})$ and $\tilde{\mathcal{O}}(T)$.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Local and Global Uniform Convexity Conditions
Authors:
Thomas Kerdreux,
Alexandre d'Aspremont,
Sebastian Pokutta
Abstract:
We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity resul…
▽ More
We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity results in learning theory, online learning, or offline optimization, which rely on the strong convexity of the feasible set. While they have a significant impact on complexity, these strong convexity or uniform convexity properties of feasible sets are not exploited as thoroughly as their functional counterparts, and this work is an effort to correct this imbalance. We conclude with some practical examples in optimization and machine learning where leveraging these conditions and localized assumptions lead to new complexity results.
△ Less
Submitted 18 February, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Acceleration Methods
Authors:
Alexandre d'Aspremont,
Damien Scieur,
Adrien Taylor
Abstract:
This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nest…
▽ More
This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters.
△ Less
Submitted 21 December, 2021; v1 submitted 23 January, 2021;
originally announced January 2021.
-
A Bregman Method for Structure Learning on Sparse Directed Acyclic Graphs
Authors:
Manon Romain,
Alexandre d'Aspremont
Abstract:
We develop a Bregman proximal gradient method for structure learning on linear structural causal models. While the problem is non-convex, has high curvature and is in fact NP-hard, Bregman gradient methods allow us to neutralize at least part of the impact of curvature by measuring smoothness against a highly nonlinear kernel. This allows the method to make longer steps and significantly improves…
▽ More
We develop a Bregman proximal gradient method for structure learning on linear structural causal models. While the problem is non-convex, has high curvature and is in fact NP-hard, Bregman gradient methods allow us to neutralize at least part of the impact of curvature by measuring smoothness against a highly nonlinear kernel. This allows the method to make longer steps and significantly improves convergence. Each iteration requires solving a Bregman proximal step which is convex and efficiently solvable for our particular choice of kernel. We test our method on various synthetic and real data sets.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Averaging Atmospheric Gas Concentration Data using Wasserstein Barycenters
Authors:
Mathieu Barré,
Clément Giron,
Matthieu Mazzolini,
Alexandre d'Aspremont
Abstract:
Hyperspectral satellite images report greenhouse gas concentrations worldwide on a daily basis. While taking simple averages of these images over time produces a rough estimate of relative emission rates, atmospheric transport means that simple averages fail to pinpoint the source of these emissions. We propose using Wasserstein barycenters coupled with weather data to average gas concentration da…
▽ More
Hyperspectral satellite images report greenhouse gas concentrations worldwide on a daily basis. While taking simple averages of these images over time produces a rough estimate of relative emission rates, atmospheric transport means that simple averages fail to pinpoint the source of these emissions. We propose using Wasserstein barycenters coupled with weather data to average gas concentration data sets and better concentrate the mass around significant sources.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention
Authors:
Grégoire Mialon,
Dexiong Chen,
Alexandre d'Aspremont,
Julien Mairal
Abstract:
We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal…
▽ More
We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK.
△ Less
Submitted 9 February, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
FANOK: Knockoffs in Linear Time
Authors:
Armin Askari,
Quentin Rebjock,
Alexandre d'Aspremont,
Laurent El Ghaoui
Abstract:
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the…
▽ More
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O(pk^2)$. We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500,000$.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Global Convergence of Frank Wolfe on One Hidden Layer Networks
Authors:
Alexandre d'Aspremont,
Mert Pilanci
Abstract:
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then conv…
▽ More
We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate $O(1/T)$ where $T$ is both the number of neurons and the number of calls to the oracle.
△ Less
Submitted 6 February, 2020;
originally announced February 2020.
-
Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Functions
Authors:
Grégoire Mialon,
Alexandre d'Aspremont,
Julien Mairal
Abstract:
We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regressi…
▽ More
We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.
△ Less
Submitted 12 June, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Ranking and synchronization from pairwise measurements via SVD
Authors:
Alexandre d'Aspremont,
Mihai Cucuringu,
Hemant Tyagi
Abstract:
Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; $\{i,j\} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking…
▽ More
Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; $\{i,j\} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking of $n$ teams (induced by $r$) given a small subset of noisy pairwise rank offsets. We propose a simple SVD-based algorithmic pipeline for both the problem of time synchronization and ranking. We provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise perturbations with outliers, using results from matrix perturbation and random matrix theory. Our theoretical findings are complemented by a detailed set of numerical experiments on both synthetic and real data, showcasing the competitiveness of our proposed algorithms with other state-of-the-art methods.
△ Less
Submitted 7 August, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport
Authors:
François-Pierre Paty,
Alexandre d'Aspremont,
Marco Cuturi
Abstract:
Estimating Wasserstein distances between two high-dimensional densities suffers from the curse of dimensionality: one needs an exponential (wrt dimension) number of samples to ensure that the distance between two empirical measures is comparable to the distance between the original densities. Therefore, optimal transport (OT) can only be used in machine learning if it is substantially regularized.…
▽ More
Estimating Wasserstein distances between two high-dimensional densities suffers from the curse of dimensionality: one needs an exponential (wrt dimension) number of samples to ensure that the distance between two empirical measures is comparable to the distance between the original densities. Therefore, optimal transport (OT) can only be used in machine learning if it is substantially regularized. On the other hand, one of the greatest achievements of the OT literature in recent years lies in regularity theory: Caffarelli showed that the OT map between two well behaved measures is Lipschitz, or equivalently when considering 2-Wasserstein distances, that Brenier convex potentials (whose gradient yields an optimal map) are smooth. We propose in this work to draw inspiration from this theory and use regularity as a regularization tool. We give algorithms operating on two discrete measures that can recover nearly optimal transport maps with small distortion, or equivalently, nearly optimal Brenier potentials that are strongly convex and smooth. The problem boils down to solving alternatively a convex QCQP and a discrete OT problem, granting access to the values and gradients of the Brenier potential not only on sampled points, but also out of sample at the cost of solving a simpler QCQP for each evaluation. We propose algorithms to estimate and evaluate transport maps with desired regularity properties, benchmark their statistical performance, apply them to domain adaptation and visualize their action on a color transfer task.
△ Less
Submitted 10 July, 2020; v1 submitted 26 May, 2019;
originally announced May 2019.
-
Naive Feature Selection: Sparsity in Naive Bayes
Authors:
Armin Askari,
Alexandre d'Aspremont,
Laurent El Ghaoui
Abstract:
Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We pro…
▽ More
Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with $1.6$ million training points and about $12$ million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.
△ Less
Submitted 30 July, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Overcomplete Independent Component Analysis via SDP
Authors:
Anastasia Podosinnikova,
Amelia Perry,
Alexander Wein,
Francis Bach,
Alexandre d'Aspremont,
David Sontag
Abstract:
We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoret…
▽ More
We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Reconstructing Latent Orderings by Spectral Clustering
Authors:
Antoine Recanati,
Thomas Kerdreux,
Alexandre d'Aspremont
Abstract:
Spectral clustering uses a graph Laplacian spectral embedding to enhance the cluster structure of some data sets. When the embedding is one dimensional, it can be used to sort the items (spectral ordering). A number of empirical results also suggests that a multidimensional Laplacian embedding enhances the latent ordering of the data, if any. This also extends to circular orderings, a case where u…
▽ More
Spectral clustering uses a graph Laplacian spectral embedding to enhance the cluster structure of some data sets. When the embedding is one dimensional, it can be used to sort the items (spectral ordering). A number of empirical results also suggests that a multidimensional Laplacian embedding enhances the latent ordering of the data, if any. This also extends to circular orderings, a case where unidimensional embeddings fail. We tackle the task of retrieving linear and circular orderings in a unifying framework, and show how a latent ordering on the data translates into a filamentary structure on the Laplacian embedding. We propose a method to recover it, illustrated with numerical experiments on synthetic data and real DNA sequencing data. The code and experiments are available at https://github.com/antrec/mdso.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Nonlinear Acceleration of CNNs
Authors:
Damien Scieur,
Edouard Oyallon,
Alexandre d'Aspremont,
Francis Bach
Abstract:
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to…
▽ More
The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
Online Regularized Nonlinear Acceleration
Authors:
Damien Scieur,
Edouard Oyallon,
Alexandre d'Aspremont,
Francis Bach
Abstract:
Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is com…
▽ More
Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context. Here, we adapt RNA to overcome these issues, so that our scheme can be used on fast algorithms such as gradient methods with momentum. We show optimal complexity bounds for quadratics and asymptotically optimal rates on general convex minimization problems. Moreover, this new scheme works online, i.e., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods.
△ Less
Submitted 21 June, 2019; v1 submitted 24 May, 2018;
originally announced May 2018.
-
Frank-Wolfe with Subsampling Oracle
Authors:
Thomas Kerdreux,
Fabian Pedregosa,
Alexandre d'Aspremont
Abstract:
We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small \emph{subset} of the original domain. The first algorithm that we propose is a randomized variant of th…
▽ More
We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small \emph{subset} of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of the algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
Learning with Clustering Structure
Authors:
Vincent Roulet,
Fajwel Fogel,
Alexandre d'Aspremont,
Francis Bach
Abstract:
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grou** words together and identify synonyms. The sample clustering problem on the other hand, applies to multic…
▽ More
We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grou** words together and identify synonyms. The sample clustering problem on the other hand, applies to multiclass problems where we are allowed to make multiple predictions and the performance of the best answer is recorded. We derive a unified optimization formulation highlighting the common structure of these problems and produce algorithms whose core iteration complexity amounts to a k-means clustering step, which can be approximated efficiently. We extend these results to combine sparsity and clustering constraints, and develop a new projection algorithm on the set of clustered sparse vectors. We prove convergence of our algorithms on random instances, based on a union of subspaces interpretation of the clustering structure. Finally, we test the robustness of our methods on artificial data sets as well as real data extracted from movie reviews.
△ Less
Submitted 19 September, 2016; v1 submitted 16 June, 2015;
originally announced June 2015.
-
Spectral Ranking using Seriation
Authors:
Fajwel Fogel,
Alexandre d'Aspremont,
Milan Vojnovic
Abstract:
We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation a…
▽ More
We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than classical scoring methods. Finally, we bound the ranking error when only a random subset of the comparions are observed. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods.
△ Less
Submitted 10 March, 2016; v1 submitted 20 June, 2014;
originally announced June 2014.
-
Convex Relaxations for Subset Selection
Authors:
Francis Bach,
Selin Damla Ahipasaoglu,
Alexandre d'Aspremont
Abstract:
We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selec…
▽ More
We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selection problems.
△ Less
Submitted 17 June, 2010;
originally announced June 2010.
-
Predicting Abnormal Returns From News Using Text Classification
Authors:
Ronny Luss,
Alexandre d'Aspremont
Abstract:
We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the d…
▽ More
We show how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. Multiple kernel learning is used to combine equity returns with text as predictive features to increase classification performance and we develop an analytic center cutting plane method to solve the kernel learning problem efficiently. We observe that while the direction of returns is not predictable using either text or returns, their size is, with text features producing significantly better performance than historical returns alone.
△ Less
Submitted 24 June, 2009; v1 submitted 16 September, 2008;
originally announced September 2008.
-
Support Vector Machine Classification with Indefinite Kernels
Authors:
Ronny Luss,
Alexandre d'Aspremont
Abstract:
We propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our algorithm simultaneously computes support vectors and a proxy kernel matrix used in forming the loss. This can be interpreted as a penalized kernel learning problem where indefinite kernel matrices are treated as a noisy observations…
▽ More
We propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our algorithm simultaneously computes support vectors and a proxy kernel matrix used in forming the loss. This can be interpreted as a penalized kernel learning problem where indefinite kernel matrices are treated as a noisy observations of a true Mercer kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the projected gradient or analytic center cutting plane methods. We compare the performance of our technique with other methods on several classic data sets.
△ Less
Submitted 4 August, 2009; v1 submitted 1 April, 2008;
originally announced April 2008.
-
Identifying Small Mean Reverting Portfolios
Authors:
Alexandre d'Aspremont
Abstract:
Given multivariate time series, we study the problem of forming portfolios with maximum mean reversion while constraining the number of assets in these portfolios. We show that it can be formulated as a sparse canonical correlation analysis and study various algorithms to solve the corresponding sparse generalized eigenvalue problems. After discussing penalized parameter estimation procedures, w…
▽ More
Given multivariate time series, we study the problem of forming portfolios with maximum mean reversion while constraining the number of assets in these portfolios. We show that it can be formulated as a sparse canonical correlation analysis and study various algorithms to solve the corresponding sparse generalized eigenvalue problems. After discussing penalized parameter estimation procedures, we study the sparsity versus predictability tradeoff and the impact of predictability in various markets.
△ Less
Submitted 26 February, 2008; v1 submitted 22 August, 2007;
originally announced August 2007.
-
Optimal Solutions for Sparse Principal Component Analysis
Authors:
Alexandre d'Aspremont,
Francis Bach,
Laurent El Ghaoui
Abstract:
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this prob…
▽ More
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non zero coefficients, with total complexity O(n^3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n^3) per pattern. We discuss applications in subset selection and sparse recovery and show on artificial examples and biological data that our algorithm does provide globally optimal solutions in many cases.
△ Less
Submitted 9 November, 2007; v1 submitted 4 July, 2007;
originally announced July 2007.
-
Model Selection Through Sparse Maximum Likelihood Estimation
Authors:
Onureena Banerjee,
Laurent El Ghaoui,
Alexandre d'Aspremont
Abstract:
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added l_1-norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems wit…
▽ More
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added l_1-norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive l_1-norm penalized regression. Our second algorithm, based on Nesterov's first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright & Jordan (2006)), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
△ Less
Submitted 4 July, 2007;
originally announced July 2007.
-
Clustering and Feature Selection using Sparse Principal Component Analysis
Authors:
Ronny Luss,
Alexandre d'Aspremont
Abstract:
In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow…
▽ More
In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in d'Aspremont et al. (2005). We then apply these results to some classic clustering and feature selection problems arising in biology.
△ Less
Submitted 8 October, 2008; v1 submitted 4 July, 2007;
originally announced July 2007.
-
A Semidefinite Relaxation for Air Traffic Flow Scheduling
Authors:
Alexandre d'Aspremont,
Laurent El Ghaoui
Abstract:
We first formulate the problem of optimally scheduling air traffic low with sector capacity constraints as a mixed integer linear program. We then use semidefinite relaxation techniques to form a convex relaxation of that problem. Finally, we present a randomization algorithm to further improve the quality of the solution. Because of the specific structure of the air traffic flow problem, the re…
▽ More
We first formulate the problem of optimally scheduling air traffic low with sector capacity constraints as a mixed integer linear program. We then use semidefinite relaxation techniques to form a convex relaxation of that problem. Finally, we present a randomization algorithm to further improve the quality of the solution. Because of the specific structure of the air traffic flow problem, the relaxation has a single semidefinite constraint of size dn where d is the maximum delay and n the number of flights.
△ Less
Submitted 26 September, 2006;
originally announced September 2006.
-
A Market Test for the Positivity of Arrow-Debreu Prices
Authors:
Alexandre d'Aspremont
Abstract:
We derive tractable necessary and sufficient conditions for the absence of buy-and-hold arbitrage opportunities in a perfectly liquid, one period market. We formulate the positivity of Arrow-Debreu prices as a generalized moment problem to show that this no arbitrage condition is equivalent to the positive semidefiniteness of matrices formed by the market price of tradeable securities and their…
▽ More
We derive tractable necessary and sufficient conditions for the absence of buy-and-hold arbitrage opportunities in a perfectly liquid, one period market. We formulate the positivity of Arrow-Debreu prices as a generalized moment problem to show that this no arbitrage condition is equivalent to the positive semidefiniteness of matrices formed by the market price of tradeable securities and their products. We apply this result to a market with multiple assets and basket call options.
△ Less
Submitted 15 June, 2006; v1 submitted 11 October, 2005;
originally announced October 2005.
-
Sparse Covariance Selection via Robust Maximum Likelihood Estimation
Authors:
Onureena Banerjee,
Alexandre d'Aspremont,
Laurent El Ghaoui
Abstract:
We address a problem of covariance selection, where we seek a trade-off between a high likelihood against the number of non-zero elements in the inverse covariance matrix. We solve a maximum likelihood problem with a penalty term given by the sum of absolute values of the elements of the inverse covariance matrix, and allow for imposing bounds on the condition number of the solution. The problem…
▽ More
We address a problem of covariance selection, where we seek a trade-off between a high likelihood against the number of non-zero elements in the inverse covariance matrix. We solve a maximum likelihood problem with a penalty term given by the sum of absolute values of the elements of the inverse covariance matrix, and allow for imposing bounds on the condition number of the solution. The problem is directly amenable to now standard interior-point algorithms for convex optimization, but remains challenging due to its size. We first give some results on the theoretical computational complexity of the problem, by showing that a recent methodology for non-smooth convex optimization due to Nesterov can be applied to this problem, to greatly improve on the complexity estimate given by interior-point algorithms. We then examine two practical algorithms aimed at solving large-scale, noisy (hence dense) instances: one is based on a block-coordinate descent approach, where columns and rows are updated sequentially, another applies a dual version of Nesterov's method.
△ Less
Submitted 8 June, 2005;
originally announced June 2005.
-
Static versus Dynamic Arbitrage Bounds on Multivariate Option Prices
Authors:
Alexandre d'Aspremont
Abstract:
We compare static arbitrage price bounds on basket calls, i.e. bounds that only involve buy-and-hold trading strategies, with the price range obtained within a multi-variate generalization of the Black-Scholes model. While there is no gap between these two sets of prices in the univariate case, we observe here that contrary to our intuition about model risk for at-the-money calls, there is a som…
▽ More
We compare static arbitrage price bounds on basket calls, i.e. bounds that only involve buy-and-hold trading strategies, with the price range obtained within a multi-variate generalization of the Black-Scholes model. While there is no gap between these two sets of prices in the univariate case, we observe here that contrary to our intuition about model risk for at-the-money calls, there is a somewhat large gap between model prices and static arbitrage prices, hence a similarly large set of prices on which a multivariate Black-Scholes model cannot be calibrated but where no conclusion can be drawn on the presence or not of a static arbitrage opportunity.
△ Less
Submitted 10 July, 2004;
originally announced July 2004.
-
A direct formulation for sparse PCA using semidefinite programming
Authors:
Alexandre d'Aspremont,
Laurent El Ghaoui,
Michael I. Jordan,
Gert R. G. Lanckriet
Abstract:
We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representat…
▽ More
We examine the problem of approximating, in the Frobenius-norm sense, a positive, semidefinite symmetric matrix by a rank-one matrix, with an upper bound on the cardinality of its eigenvector. The problem arises in the decomposition of a covariance matrix into sparse factors, and has wide applications ranging from biology to finance. We use a modification of the classical variational representation of the largest eigenvalue of a symmetric matrix, where cardinality is constrained, and derive a semidefinite programming based relaxation for our problem. We also discuss Nesterov's smooth minimization technique applied to the SDP arising in the direct sparse PCA method.
△ Less
Submitted 20 May, 2006; v1 submitted 15 June, 2004;
originally announced June 2004.
-
Risk-Management Methods for the Libor Market Model Using Semidefinite Programming
Authors:
Alexandre d'Aspremont
Abstract:
When interest rate dynamics are described by the Libor Market Model as in BGM97, we show how some essential risk-management results can be obtained from the dual of the calibration program. In particular, if the objetive is to maximize another swaption's price, we show that the optimal dual variables describe a hedging portfolio in the sense of \cite{Avel96}. In the general case, the local sensi…
▽ More
When interest rate dynamics are described by the Libor Market Model as in BGM97, we show how some essential risk-management results can be obtained from the dual of the calibration program. In particular, if the objetive is to maximize another swaption's price, we show that the optimal dual variables describe a hedging portfolio in the sense of \cite{Avel96}. In the general case, the local sensitivity of the covariance matrix to all market movement scenarios can be directly computed from the optimal dual solution. We also show how semidefinite programming can be used to manage the Gamma exposure of a portfolio.
△ Less
Submitted 5 October, 2005; v1 submitted 24 February, 2003;
originally announced February 2003.
-
Interest Rate Model Calibration Using Semidefinite Programming
Authors:
Alexandre d'Aspremont
Abstract:
We show that, for the purpose of pricing Swaptions, the Swap rate and the corresponding Forward rates can be considered lognormal under a single martingale measure. Swaptions can then be priced as options on a basket of lognormal assets and an approximation formula is derived for such options. This formula is centered around a Black-Scholes price with an appropriate volatility, plus a correction…
▽ More
We show that, for the purpose of pricing Swaptions, the Swap rate and the corresponding Forward rates can be considered lognormal under a single martingale measure. Swaptions can then be priced as options on a basket of lognormal assets and an approximation formula is derived for such options. This formula is centered around a Black-Scholes price with an appropriate volatility, plus a correction term that can be interpreted as the expected tracking error. The calibration problem can then be solved very efficiently using semidefinite programming.
△ Less
Submitted 5 October, 2005; v1 submitted 24 February, 2003;
originally announced February 2003.