Search | arXiv e-print repository

Time topological analysis of EEG using signature theory

Authors: Stéphane Chrétien, Ben Gao, Astrid Thebault-Guiochon, Rémi Vaucher

Abstract: Anomaly detection in multivariate signals is a task of paramount importance in many disciplines (epidemiology, finance, cognitive sciences and neurosciences, oncology, etc.). In this perspective, Topological Data Analysis (TDA) offers a battery of "shape" invariants that can be exploited for the implementation of an effective detection scheme. Our contribution consists of extending the constructio… ▽ More Anomaly detection in multivariate signals is a task of paramount importance in many disciplines (epidemiology, finance, cognitive sciences and neurosciences, oncology, etc.). In this perspective, Topological Data Analysis (TDA) offers a battery of "shape" invariants that can be exploited for the implementation of an effective detection scheme. Our contribution consists of extending the constructions presented in \cite{chretienleveraging} on the construction of simplicial complexes from the Signatures of signals and their predictive capacities, rather than the use of a generic distance as in \cite{petri2014homological}. Signature theory is a new theme in Machine Learning arXiv:1603.03788 stemming from recent work on the notions of Rough Paths developed by Terry Lyons and his team \cite{lyons2002system} based on the formalism introduced by Chen \cite{chen1957integration}. We explore in particular the detection of changes in topology, based on tracking the evolution of homological persistence and the Betti numbers associated with the complex introduced in \cite{chretienleveraging}. We apply our tools for the analysis of brain signals such as EEG to detect precursor phenomena to epileptic seizures. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 14 pages, 5 figures Under review for Journée des Statistiques 2024

arXiv:2401.08562 [pdf, other]

Registration of algebraic varieties using Riemannian optimization

Authors: Florentin Goyens, Coralia Cartis, Stéphane Chrétien

Abstract: We consider the point cloud registration problem, the task of finding a transformation between two point clouds that represent the same object but are expressed in different coordinate systems. Our approach is not based on a point-to-point correspondence, matching every point in the source point cloud to a point in the target point cloud. Instead, we assume and leverage a low-dimensional nonlinear… ▽ More We consider the point cloud registration problem, the task of finding a transformation between two point clouds that represent the same object but are expressed in different coordinate systems. Our approach is not based on a point-to-point correspondence, matching every point in the source point cloud to a point in the target point cloud. Instead, we assume and leverage a low-dimensional nonlinear geometric structure of the data. Firstly, we approximate each point cloud by an algebraic variety (a set defined by finitely many polynomial equations). This is done by solving an optimization problem on the Grassmann manifold, using a connection between algebraic varieties and polynomial bases. Secondly, we solve an optimization problem on the orthogonal group to find the transformation (rotation $+$ translation) which makes the two algebraic varieties overlap. We use second-order Riemannian optimization methods for the solution of both steps. Numerical experiments on real and synthetic data are provided, with encouraging results. Our approach is particularly useful when the two point clouds describe different parts of an objects (which may not even be overlap**), on the condition that the surface of the object may be well approximated by a set of polynomial equations. The first procedure -- the approximation -- is of independent interest, as it can be used for denoising data that belongs to an algebraic variety. We provide statistical guarantees for the estimation error of the denoising using Stein's unbiased estimator. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2305.10129 [pdf]

doi 10.1016/j.ndteint.2023.102852

The realisation of fast X-ray computed tomography using a limited number of projection images for dimensional metrology

Authors: Wenjuan Sun, Stephan Chretien, Ander Biguri, Manuchehr Soleimani, Thomas Blumensath, Jessica Talbott

Abstract: Due to the merit of establishing volumetric data, X-ray computed tomography (XCT) is increasingly used as a non-destructive evaluation technique in the quality control of advanced manufactured parts with complex non-line-of-sight features. However, the cost of measurement time and data storage hampers the adoption of the technique in production lines. Commercial fast XCT utilises X-ray detectors w… ▽ More Due to the merit of establishing volumetric data, X-ray computed tomography (XCT) is increasingly used as a non-destructive evaluation technique in the quality control of advanced manufactured parts with complex non-line-of-sight features. However, the cost of measurement time and data storage hampers the adoption of the technique in production lines. Commercial fast XCT utilises X-ray detectors with fast detection capability, which can be expensive and results a large amount of data. This paper discussed a different approach, where fast XCT was realised via the acquisition of a small number of projection images instead of full projection images. An established total variation (TV) algorithm was used to handle the reconstruction. The paper investigates the feasibility of using the TV algorithm in handling a significantly reduced number of projection images for reconstruction. This allows a reduction of measurement time from fifty-two minutes to one minute for a typical industrial XCT system. It also enables a reduction of data size proportionally. A test strategy including both quantitative and qualitative test metrics was considered to evaluate the effectiveness of the reconstruction algorithm. The qualitative evaluation includes both the signal to noise ratio and the contrast to noise ratio. The quantitative evaluation was established using reference samples with different internal and external geometries. Simulation data were used in the assessment considering various influence factors, such as X-ray source property and instrument noise. The results demonstrated the possibility of using advanced reconstruction algorithms in handling XCT measurements with a significantly limited number of projection images for dimensional measurements. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: NDT & E International (2023)

arXiv:2305.07908 [pdf, other]

Convergence and scaling of Boolean-weight optimization for hardware reservoirs

Authors: Louis Andreoli, Stéphane Chrétien, Xavier Porte, Daniel Brunner

Abstract: Hardware implementation of neural network are an essential step to implement next generation efficient and powerful artificial intelligence solutions. Besides the realization of a parallel, efficient and scalable hardware architecture, the optimization of the system's extremely large parameter space with sampling-efficient approaches is essential. Here, we analytically derive the scaling laws… ▽ More Hardware implementation of neural network are an essential step to implement next generation efficient and powerful artificial intelligence solutions. Besides the realization of a parallel, efficient and scalable hardware architecture, the optimization of the system's extremely large parameter space with sampling-efficient approaches is essential. Here, we analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimizing the readout layer of a random recurrently connection neural network, a reservoir. We demonstrate that the convergence is exponential and scales linear with the network's number of neurons. Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment. Our work therefore provides a solid foundation for such optimization in hardware networks, and identifies future directions that are promising for optimizing convergence speed during learning leveraging measures of a neural network's amplitude statistics and the weight update rule. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Comments: Submitted to ECML-PKDD workshop Deep Learning meets Neuromorphic Hardware

arXiv:2110.15653 [pdf, other]

An SDP dual relaxation for the Robust Shortest Path Problem with ellipsoidal uncertainty: Pierra's decomposition method and a new primal Frank-Wolfe-type heuristics for duality gap evaluation

Authors: Chifaa Al Dahik, Zeina Al Masry, Stéphane Chrétien, Jean-Marc Nicod, Landy Rabehasaina

Abstract: This work addresses the Robust counterpart of the Shortest Path Problem (RSPP) with a correlated uncertainty set. Since this problem is hard, a heuristic approach, based on Frank-Wolfe's algorithm named Discrete Frank-Wolf (DFW), has recently been proposed. The aim of this paper is to propose a semi-definite programming relaxation for the RSPP that provides a lower bound to validate approaches suc… ▽ More This work addresses the Robust counterpart of the Shortest Path Problem (RSPP) with a correlated uncertainty set. Since this problem is hard, a heuristic approach, based on Frank-Wolfe's algorithm named Discrete Frank-Wolf (DFW), has recently been proposed. The aim of this paper is to propose a semi-definite programming relaxation for the RSPP that provides a lower bound to validate approaches such as DFW Algorithm. The relaxed problem results from a bidualization that is done {through} a reformulation of the RSPP into a quadratic problem. Then the relaxed problem is solved using a sparse version of Pierra's decomposition in a product space method. This validation method is suitable for large size problems. The numerical experiments show that the gap between the solutions obtained with the relaxed and the heuristic approaches is relatively small. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2007.12882 [pdf, ps, other]

A finite sample analysis of the benign overfitting phenomenon for ridge function estimation

Authors: Emmanuel Caron, Stephane Chretien

Abstract: Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error increases, but surprisingly, it starts decreasing again past the threshold… ▽ More Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error increases, but surprisingly, it starts decreasing again past the threshold $p=n$. This phenomenon, brought to the theoretical community attention in \cite{belkin2019reconciling}, has been thoroughly investigated lately, more specifically for simpler models than deep neural networks, such as the linear model when the parameter is taken to be the minimum norm solution to the least-squares problem, firstly in the asymptotic regime when $p$ and $n$ tend to infinity, see e.g. \cite{hastie2019surprises}, and recently in the finite dimensional regime and more specifically for linear models \cite{bartlett2020benign}, \cite{tsigler2020benign}, \cite{lecue2022geometrical}. In the present paper, we propose a finite sample analysis of non-linear models of \textit{ridge} type, where we investigate the \textit{overparametrised regime} of the double descent phenomenon for both the \textit{estimation problem} and the \textit{prediction} problem. Our results provide a precise analysis of the distance of the best estimator from the true parameter as well as a generalisation bound which complements recent works of \cite{bartlett2020benign} and \cite{chinot2020benign}. Our analysis is based on tools closely related to the continuous Newton method \cite{neuberger2007continuous} and a refined quantitative analysis of the performance in prediction of the minimum $\ell_2$-norm solution. △ Less

Submitted 12 January, 2024; v1 submitted 25 July, 2020; originally announced July 2020.

Comments: New section on generalisation added

arXiv:2007.02708 [pdf, other]

The dual approach to non-negative super-resolution: perturbation analysis

Authors: Stéphane Chrétien, Andrew Thompson, Bogdan Toader

Abstract: We study the problem of super-resolution, where we recover the locations and weights of non-negative point sources from a few samples of their convolution with a Gaussian kernel. It has been shown that exact recovery is possible by minimising the total variation norm of the measure, and a practical way of achieve this is by solving the dual problem. In this paper, we study the stability of solutio… ▽ More We study the problem of super-resolution, where we recover the locations and weights of non-negative point sources from a few samples of their convolution with a Gaussian kernel. It has been shown that exact recovery is possible by minimising the total variation norm of the measure, and a practical way of achieve this is by solving the dual problem. In this paper, we study the stability of solutions with respect to the solutions dual problem, both in the case of exact measurements and in the case of measurements with additive noise. In particular, we establish a relationship between perturbations in the dual variable and perturbations in the primal variable around the optimiser and a similar relationship between perturbations in the dual variable around the optimiser and the magnitude of the additive noise in the measurements. Our analysis is based on a quantitative version of the implicit function theorem. △ Less

Submitted 4 July, 2023; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 35 pages, 5 figures

arXiv:2004.01869 [pdf, other]

Learning with Semi-Definite Programming: new statistical bounds based on fixed point analysis and excess risk curvature

Authors: Stéphane Chrétien, Mihai Cucuringu, Guillaume Lecué, Lucie Neirac

Abstract: Many statistical learning problems have recently been shown to be amenable to Semi-Definite Programming (SDP), with community detection and clustering in Gaussian mixture models as the most striking instances [javanmard et al., 2016]. Given the growing range of applications of SDP-based techniques to machine learning problems, and the rapid progress in the design of efficient algorithms for solvin… ▽ More Many statistical learning problems have recently been shown to be amenable to Semi-Definite Programming (SDP), with community detection and clustering in Gaussian mixture models as the most striking instances [javanmard et al., 2016]. Given the growing range of applications of SDP-based techniques to machine learning problems, and the rapid progress in the design of efficient algorithms for solving SDPs, an intriguing question is to understand how the recent advances from empirical process theory can be put to work in order to provide a precise statistical analysis of SDP estimators. In the present paper, we borrow cutting edge techniques and concepts from the learning theory literature, such as fixed point equations and excess risk curvature arguments, which yield general estimation and prediction results for a wide class of SDP estimators. From this perspective, we revisit some classical results in community detection from [guédon et al.,2016] and [chen et al., 2016], and we obtain statistical guarantees for SDP estimators used in signed clustering, group synchronization and MAXCUT. △ Less

Submitted 4 April, 2020; originally announced April 2020.

arXiv:2003.12319 [pdf, other]

doi 10.1515/nanoph-2020-0171

Boolean learning under noise-perturbations in hardware neural networks

Authors: Louis Andreoli, Xavier Porte, Stéphane Chrétien, Maxime Jacquot, Laurent Larger, Daniel Brunner

Abstract: A high efficiency hardware integration of neural networks benefits from realizing nonlinearity, network connectivity and learning fully in a physical substrate. Multiple systems have recently implemented some or all of these operations, yet the focus was placed on addressing technological challenges. Fundamental questions regarding learning in hardware neural networks remain largely unexplored. No… ▽ More A high efficiency hardware integration of neural networks benefits from realizing nonlinearity, network connectivity and learning fully in a physical substrate. Multiple systems have recently implemented some or all of these operations, yet the focus was placed on addressing technological challenges. Fundamental questions regarding learning in hardware neural networks remain largely unexplored. Noise in particular is unavoidable in such architectures, and here we investigate its interaction with a learning algorithm using an opto-electronic recurrent neural network. We find that noise strongly modifies the system's path during convergence, and surprisingly fully decorrelates the final readout weight matrices. This highlights the importance of understanding architecture, noise and learning algorithm as interacting players, and therefore identifies the need for mathematical tools for noisy, analogue system optimization. △ Less

Submitted 25 June, 2021; v1 submitted 27 March, 2020; originally announced March 2020.

Comments: 8 pages, 5 figures

Journal ref: Nanophotonics (published online ahead of print), 20200171 (2020)

arXiv:1910.04735 [pdf, other]

Dynamical mean field theory algorithm and experiment on quantum computers

Authors: I. Rungger, N. Fitzpatrick, H. Chen, C. H. Alderete, H. Apel, A. Cowtan, A. Patterson, D. Munoz Ramo, Y. Zhu, N. H. Nguyen, E. Grant, S. Chretien, L. Wossnig, N. M. Linke, R. Duncan

Abstract: The developments of quantum computing algorithms and experiments for atomic scale simulations have largely focused on quantum chemistry for molecules, while their application in condensed matter systems is scarcely explored. Here we present a quantum algorithm to perform dynamical mean field theory (DMFT) calculations for condensed matter systems on currently available quantum computers, and demon… ▽ More The developments of quantum computing algorithms and experiments for atomic scale simulations have largely focused on quantum chemistry for molecules, while their application in condensed matter systems is scarcely explored. Here we present a quantum algorithm to perform dynamical mean field theory (DMFT) calculations for condensed matter systems on currently available quantum computers, and demonstrate it on two quantum hardware platforms. DMFT is required to properly describe the large class of materials with strongly correlated electrons. The computationally challenging part arises from solving the effective problem of an interacting impurity coupled to a bath, which scales exponentially with system size on conventional computers. An exponential speedup is expected on quantum computers, but the algorithms proposed so far are based on real time evolution of the wavefunction, which requires high-depth circuits and hence very low noise levels in the quantum hardware. Here we propose an alternative approach, which uses the variational quantum eigensolver (VQE) method for ground and excited states to obtain the needed quantities as part of an exact diagonalization impurity solver. We present the algorithm for a two site DMFT system, which we benchmark using simulations on conventional computers as well as experiments on superconducting and trapped ion qubits, demonstrating that this method is suitable for running DMFT calculations on currently available quantum hardware. △ Less

Submitted 8 January, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

arXiv:1904.01926 [pdf, ps, other]

The dual approach to non-negative super-resolution: impact on primal reconstruction accuracy

Authors: Stephane Chretien, Andrew Thompson, Bogdan Toader

Abstract: We study the problem of super-resolution, where we recover the locations and weights of non-negative point sources from a few samples of their convolution with a Gaussian kernel. It has been recently shown that exact recovery is possible by minimising the total variation norm of the measure. An alternative practical approach is to solve its dual. In this paper, we study the stability of solutions… ▽ More We study the problem of super-resolution, where we recover the locations and weights of non-negative point sources from a few samples of their convolution with a Gaussian kernel. It has been recently shown that exact recovery is possible by minimising the total variation norm of the measure. An alternative practical approach is to solve its dual. In this paper, we study the stability of solutions with respect to the solutions to the dual problem. In particular, we establish a relationship between perturbations in the dual variable and the primal variables around the optimiser. This is achieved by applying a quantitative version of the implicit function theorem in a non-trivial way. △ Less

Submitted 8 May, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: 4 pages double column

arXiv:1903.04479 [pdf, other]

Revisiting clustering as matrix factorisation on the Stiefel manifold

Authors: Stéphane Chrétien, Benjamin Guedj

Abstract: This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors… ▽ More This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator. △ Less

Submitted 18 June, 2020; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: Accepted at the LOD 2020 Conference -- The Sixth International Conference on Machine Learning, Optimization, and Data Science

Journal ref: LOD 2020

arXiv:1807.02862 [pdf, other]

Multi-kernel unmixing and super-resolution using the Modified Matrix Pencil method

Authors: Stéphane Chrétien, Hemant Tyagi

Abstract: Consider $L$ groups of point sources or spike trains, with the $l^{\text{th}}$ group represented by $x_l(t)$. For a function $g:\mathbb{R} \rightarrow \mathbb{R}$, let $g_l(t) = g(t/μ_l)$ denote a point spread function with scale $μ_l > 0$, and with $μ_1 < \cdots < μ_L$. With $y(t) = \sum_{l=1}^{L} (g_l \star x_l)(t)$, our goal is to recover the source parameters given samples of $y$, or given the… ▽ More Consider $L$ groups of point sources or spike trains, with the $l^{\text{th}}$ group represented by $x_l(t)$. For a function $g:\mathbb{R} \rightarrow \mathbb{R}$, let $g_l(t) = g(t/μ_l)$ denote a point spread function with scale $μ_l > 0$, and with $μ_1 < \cdots < μ_L$. With $y(t) = \sum_{l=1}^{L} (g_l \star x_l)(t)$, our goal is to recover the source parameters given samples of $y$, or given the Fourier samples of $y$. This problem is a generalization of the usual super-resolution setup wherein $L = 1$; we call this the multi-kernel unmixing super-resolution problem. Assuming access to Fourier samples of $y$, we derive an algorithm for this problem for estimating the source parameters of each group, along with precise non-asymptotic guarantees. Our approach involves estimating the group parameters sequentially in the order of increasing scale parameters, i.e., from group $1$ to $L$. In particular, the estimation process at stage $1 \leq l \leq L$ involves (i) carefully sampling the tail of the Fourier transform of $y$, (ii) a \emph{deflation} step wherein we subtract the contribution of the groups processed thus far from the obtained Fourier samples, and (iii) applying Moitra's modified Matrix Pencil method on a deconvolved version of the samples in (ii). △ Less

Submitted 7 January, 2020; v1 submitted 8 July, 2018; originally announced July 2018.

Comments: 50 pages, 10 figures, made notational changes and corrected typos after reviewer feedback, to appear in Journal of Fourier Analysis and Applications

arXiv:1807.02589 [pdf, other]

A note on computing the Smallest Conic Singular Value

Authors: Stephane Chretien

Abstract: The goal of this note is to study the smallest conic singular value of a matrix from a Lagrangian duality viewpoint and provide an efficient method for its computation. The goal of this note is to study the smallest conic singular value of a matrix from a Lagrangian duality viewpoint and provide an efficient method for its computation. △ Less

Submitted 6 July, 2018; originally announced July 2018.

arXiv:1805.09261 [pdf, other]

Online shortest paths with confidence intervals for routing in a time varying random network

Authors: Stéphane Chrétien, Christophe Guyeux

Abstract: The increase in the world's population and rising standards of living is leading to an ever-increasing number of vehicles on the roads, and with it ever-increasing difficulties in traffic management. This traffic management in transport networks can be clearly optimized by using information and communication technologies referred as Intelligent Transport Systems (ITS). This management problem is u… ▽ More The increase in the world's population and rising standards of living is leading to an ever-increasing number of vehicles on the roads, and with it ever-increasing difficulties in traffic management. This traffic management in transport networks can be clearly optimized by using information and communication technologies referred as Intelligent Transport Systems (ITS). This management problem is usually reformulated as finding the shortest path in a time varying random graph. In this article, an online shortest path computation using stochastic gradient descent is proposed. This routing algorithm for ITS traffic management is based on the online Frank-Wolfe approach. Our improvement enables to find a confidence interval for the shortest path, by using the stochastic gradient algorithm for approximate Bayesian inference. △ Less

Submitted 22 May, 2018; originally announced May 2018.

arXiv:1805.01870 [pdf, other]

Hedging parameter selection for basis pursuit

Authors: Stephane Chretien, Alex Gibberd, Sandipan Roy

Abstract: In Compressed Sensing and high dimensional estimation, signal recovery often relies on sparsity assumptions and estimation is performed via $\ell_1$-penalized least-squares optimization, a.k.a. LASSO. The $\ell_1$ penalisation is usually controlled by a weight, also called "relaxation parameter", denoted by $λ$. It is commonly thought that the practical efficiency of the LASSO for prediction cruci… ▽ More In Compressed Sensing and high dimensional estimation, signal recovery often relies on sparsity assumptions and estimation is performed via $\ell_1$-penalized least-squares optimization, a.k.a. LASSO. The $\ell_1$ penalisation is usually controlled by a weight, also called "relaxation parameter", denoted by $λ$. It is commonly thought that the practical efficiency of the LASSO for prediction crucially relies on accurate selection of $λ$. In this short note, we propose to consider the hyper-parameter selection problem from a new perspective which combines the Hedge online learning method by Freund and Shapire, with the stochastic Frank-Wolfe method for the LASSO. Using the Hedge algorithm, we show that a our simple selection rule can achieve prediction results comparable to Cross Validation at a potentially much lower computational cost. △ Less

Submitted 4 May, 2018; originally announced May 2018.

arXiv:1804.01119 [pdf, other]

Feature selection in weakly coherent matrices

Authors: Stephane Chretien, Zhen-Wai Olivier Ho

Abstract: A problem of paramount importance in both pure (Restricted Invertibility problem) and applied mathematics (Feature extraction) is the one of selecting a submatrix of a given matrix, such that this submatrix has its smallest singular value above a specified level. Such problems can be addressed using perturbation analysis. In this paper, we propose a perturbation bound for the smallest singular val… ▽ More A problem of paramount importance in both pure (Restricted Invertibility problem) and applied mathematics (Feature extraction) is the one of selecting a submatrix of a given matrix, such that this submatrix has its smallest singular value above a specified level. Such problems can be addressed using perturbation analysis. In this paper, we propose a perturbation bound for the smallest singular value of a given matrix after appending a column, under the assumption that its initial coherence is not large, and we use this bound to derive a fast algorithm for feature extraction. △ Less

Submitted 3 April, 2018; originally announced April 2018.

Comments: 14 pages, 6 Figures, Accepted for LVA-ICA 2018 Surrey

arXiv:1804.01071 [pdf, other]

Average performance analysis of the stochastic gradient method for online PCA

Authors: Stephane Chretien, Christophe Guyeux, Zhen-Wai Olivier HO

Abstract: This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate. This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate. △ Less

Submitted 3 April, 2018; originally announced April 2018.

Comments: 11 pages, 1 figure, Submitted to LOD 2018

arXiv:1710.08812 [pdf, other]

Post-Prognostics Decision for Optimizing the Commitment of Fuel Cell Systems

Authors: Stephane Chretien, Nathalie Herr, Jean-Marc Nicod, Christophe Varnier

Abstract: In a post-prognostics decision context, this paper addresses the problem of maximizing the useful life of a platform composed of several parallel machines under service constraint. Application on multi-stack fuel cell systems is considered. In order to propose a solution to the insufficient durability of fuel cells, the purpose is to define a commitment strategy by determining at each time the con… ▽ More In a post-prognostics decision context, this paper addresses the problem of maximizing the useful life of a platform composed of several parallel machines under service constraint. Application on multi-stack fuel cell systems is considered. In order to propose a solution to the insufficient durability of fuel cells, the purpose is to define a commitment strategy by determining at each time the contribution of each fuel cell stack to the global output so as to satisfy the demand as long as possible. A relaxed version of the problem is introduced, which makes it potentially solvable for very large instances. Results based on computational experiments illustrate the efficiency of the new approach, based on the Mirror Prox algorithm, when compared with a simple method of successive projections onto the constraint sets associated with the problem. △ Less

Submitted 19 October, 2017; originally announced October 2017.

arXiv:1706.08089 [pdf, other]

Finding optimal finite biological sequences over finite alphabets: the OptiFin toolbox

Authors: Régis Garnier, Christophe Guyeux, Stéphane Chrétien

Abstract: In this paper, we present a toolbox for a specific optimization problem that frequently arises in bioinformatics or genomics. In this specific optimisation problem, the state space is a set of words of specified length over a finite alphabet. To each word is associated a score. The overall objective is to find the words which have the lowest possible score. This type of general optimization proble… ▽ More In this paper, we present a toolbox for a specific optimization problem that frequently arises in bioinformatics or genomics. In this specific optimisation problem, the state space is a set of words of specified length over a finite alphabet. To each word is associated a score. The overall objective is to find the words which have the lowest possible score. This type of general optimization problem is encountered in e.g 3D conformation optimisation for protein structure prediction, or largest core genes subset discovery based on best supported phylogenetic tree for a set of species. In order to solve this problem, we propose a toolbox that can be easily launched using MPI and embeds 3 well-known metaheuristics. The toolbox is fully parametrized and well documented. It has been specifically designed to be easy modified and possibly improved by the user depending on the application, and does not require to be a computer scientist. We show that the toolbox performs very well on two difficult practical problems. △ Less

Submitted 25 June, 2017; originally announced June 2017.

arXiv:1610.08227 [pdf, other]

A clustering tool for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Models

Authors: Marine Bruneau, Thierry Mottet, Serge Moulin, Maël Kerbiriou, Franz Chouly, Stéphane Chretien, Christophe Guyeux

Abstract: We propose a new procedure for clustering nucleotide sequences based on the "Laplacian Eigenmaps" and Gaussian Mixture modelling. This proposal is then applied to a set of 100 DNA sequences from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene of a collection of Platyhelminthes and Nematoda species. The resulting clusters are then shown to be consistent with the gene phylogenetic tree c… ▽ More We propose a new procedure for clustering nucleotide sequences based on the "Laplacian Eigenmaps" and Gaussian Mixture modelling. This proposal is then applied to a set of 100 DNA sequences from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene of a collection of Platyhelminthes and Nematoda species. The resulting clusters are then shown to be consistent with the gene phylogenetic tree computed using a maximum likelihood approach. This comparison shows in particular that the clustering produced by the methodology combining Laplacian Eigenmaps with Gaussian Mixture models is coherent with the phylogeny as well as with the NCBI taxonomy. We also developed a Python package for this procedure which is available online. △ Less

Submitted 26 October, 2016; originally announced October 2016.

arXiv:1606.09471 [pdf, ps, other]

On the subdifferential of symmetric convex functions of the spectrum for symmetric and orthogonally decomposable tensors

Authors: Stéphane Chrétien, Tianwen Wei

Abstract: The subdifferential of convex functions of the singular spectrum of real matrices has been widely studied in matrix analysis, optimization and automatic control theory. Convex optimization over spaces of tensors is now gaining much interest due to its potential applications in signal processing, statistics and engineering. The goal of this paper is to present an extension of the approach by Lewis… ▽ More The subdifferential of convex functions of the singular spectrum of real matrices has been widely studied in matrix analysis, optimization and automatic control theory. Convex optimization over spaces of tensors is now gaining much interest due to its potential applications in signal processing, statistics and engineering. The goal of this paper is to present an extension of the approach by Lewis \cite{lewis1995convex} for the analysis of the subdifferential of certain convex functions of the spectrum of symmetric tensors. We give a complete characterization of the subdifferential of Schatten-type tensor norms for symmetric tensors. Some partial results in this direction are also given for Orthogonally Decomposable tensors. △ Less

Submitted 30 June, 2016; originally announced June 2016.

arXiv:1606.09193 [pdf, ps, other]

Small coherence implies the weak Null Space Property

Authors: Stéphane Chrétien, Zhen Wai Olivier Ho

Abstract: In the Compressed Sensing community, it is well known that given a matrix $X \in \mathbb R^{n\times p}$ with $\ell_2$ normalized columns, the Restricted Isometry Property (RIP) implies the Null Space Property (NSP). It is also well known that a small Coherence $μ$ implies a weak RIP, i.e. the singular values of $X_T$ lie between $1-δ$ and $1+δ$ for "most" index subsets $T \subset \{1,\ldots,p\}$ w… ▽ More In the Compressed Sensing community, it is well known that given a matrix $X \in \mathbb R^{n\times p}$ with $\ell_2$ normalized columns, the Restricted Isometry Property (RIP) implies the Null Space Property (NSP). It is also well known that a small Coherence $μ$ implies a weak RIP, i.e. the singular values of $X_T$ lie between $1-δ$ and $1+δ$ for "most" index subsets $T \subset \{1,\ldots,p\}$ with size governed by $μ$ and $δ$. In this short note, we show that a small Coherence implies a weak Null Space Property, i.e. $\Vert h_T\Vert_2 \le C \ \Vert h_{T^c}\Vert_1/\sqrt{s}$ for most $T \subset \{1,\ldots,p\}$ with cardinality $|T|\le s$. We moreover prove some singular value perturbation bounds that may also prove useful for other applications. △ Less

Submitted 29 June, 2016; originally announced June 2016.

arXiv:1606.09190 [pdf, ps, other]

A Semi-Definite Programming approach to low dimensional embedding for unsupervised clustering

Authors: Stéphane Chrétien, Clément Dombry, Adrien Faivre

Abstract: This paper proposes a variant of the method of Guédon and Verhynin for estimating the cluster matrix in the Mixture of Gaussians framework via Semi-Definite Programming. A clustering oriented embedding is deduced from this estimate. The procedure is suitable for very high dimensional data because it is based on pairwise distances only. Theoretical garantees are provided and an eigenvalue optimisat… ▽ More This paper proposes a variant of the method of Guédon and Verhynin for estimating the cluster matrix in the Mixture of Gaussians framework via Semi-Definite Programming. A clustering oriented embedding is deduced from this estimate. The procedure is suitable for very high dimensional data because it is based on pairwise distances only. Theoretical garantees are provided and an eigenvalue optimisation approach is proposed for computing the embedding. The performance of the method is illustrated via Monte Carlo experiements and comparisons with other embeddings from the literature. △ Less

Submitted 29 June, 2016; originally announced June 2016.

arXiv:1603.01982 [pdf, other]

Simulation based estimation of branching models for LTR retrotransposons

Authors: Serge Moulin, Nicolas Seux, Stéphane Chrétien, Christophe Guyeux, Emmanuelle Lerat

Abstract: Motivation: LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows to take into account both positions and degradations of LTR retrotransposons copies. In our model, the duplication rate is also al… ▽ More Motivation: LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows to take into account both positions and degradations of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results: Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we show that an accurate estimation of the parameters of this propagation model can be performed. We applied this method to the study of the spread of the transposable elements ROO, GYPSY, and DM412 on a chromosome of \textit{Drosophila melanogaster}. Availability: Our proposal has been implemented using Python software. Source code is freely available on the web at https://github.com/SergeMOULIN/retrotransposons-spread. △ Less

Submitted 7 March, 2016; originally announced March 2016.

Comments: 7 pages, 3 figures, 7 tables. Submit to "Bioiformatics" on March 1, 2016

arXiv:1601.06042 [pdf, ps, other]

Controllability of complex networks using perturbation theory of extreme singular values

Authors: Stephane Chretien, Sebastien Darses

Abstract: Pinning control on complex dynamical networks has emerged as a very important topic in recent trends of control theory due to the extensive study of collective coupled behaviors and their role in physics, engineering and biology. In practice, real-world networks consists of a large number of vertices and one may only be able to perform a control on a fraction of them only. Controllability of such… ▽ More Pinning control on complex dynamical networks has emerged as a very important topic in recent trends of control theory due to the extensive study of collective coupled behaviors and their role in physics, engineering and biology. In practice, real-world networks consists of a large number of vertices and one may only be able to perform a control on a fraction of them only. Controllability of such systems has been addressed in \cite{PorfiriDiBernardo:Automatica08}, where it was reformulated as a global asymptotic stability problem. The goal of this short note is to refine the analysis proposed in \cite{PorfiriDiBernardo:Automatica08} using recent results in singular value perturbation theory. △ Less

Submitted 22 January, 2016; originally announced January 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1406.5441

arXiv:1511.05463 [pdf, ps, other]

On the restricted invertibility problem with an additional orthogonality constraint for random matrices

Authors: Stephane Chretien

Abstract: The Restricted Invertibility problem is the problem of selecting the largest subset of columns of a given matrix $X$, while kee** the smallest singular value of the extracted submatrix above a certain threshold. In this paper, we address this problem in the simpler case where $X$ is a random matrix but with the additional constraint that the selected columns be almost orthogonal to a given vecto… ▽ More The Restricted Invertibility problem is the problem of selecting the largest subset of columns of a given matrix $X$, while kee** the smallest singular value of the extracted submatrix above a certain threshold. In this paper, we address this problem in the simpler case where $X$ is a random matrix but with the additional constraint that the selected columns be almost orthogonal to a given vector $v$. Our main result is a lower bound on the number of columns we can extract from a normalized i.i.d. Gaussian matrix for the worst $v$. △ Less

Submitted 4 December, 2015; v1 submitted 17 November, 2015; originally announced November 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1203.5223

arXiv:1509.00748 [pdf, ps, other]

An elementary approach to the problem of column selection in a rectangular matrix

Authors: Stephane Chretien, Sebastien Darses

Abstract: The problem of extracting a well conditioned submatrix from any rectangular matrix (with normalized columns) has been studied for some time in functional and harmonic analysis; see \cite{BourgainTzafriri:IJM87,Tropp:StudiaMath08,Vershynin:IJM01} for methods using random column selection. More constructive approaches have been proposed recently; see the recent contributions of \cite{SpielmanSrivast… ▽ More The problem of extracting a well conditioned submatrix from any rectangular matrix (with normalized columns) has been studied for some time in functional and harmonic analysis; see \cite{BourgainTzafriri:IJM87,Tropp:StudiaMath08,Vershynin:IJM01} for methods using random column selection. More constructive approaches have been proposed recently; see the recent contributions of \cite{SpielmanSrivastava:IJM12,Youssef:IMRN14}. The column selection problem we consider in this paper is concerned with extracting a well conditioned submatrix, i.e. a matrix whose singular values all lie in $[1-ε,1+ε]$. We provide individual lower and upper bounds for each singular value of the extracted matrix at the price of conceding only one log factor in the number of columns, when compared to the Restricted Invertibility Theorem of Bourgain and Tzafriri. Our method is fully constructive and the proof is short and elementary. △ Less

Submitted 6 December, 2016; v1 submitted 2 September, 2015; originally announced September 2015.

Comments: 5 pages

arXiv:1508.01681 [pdf, ps, other]

Joint estimation and model order selection for one dimensional ARMA models via convex optimization: a nuclear norm penalization approach

Authors: Stéphane Chrétien, Tianwen Wei, Basad Ali Hussain Al-sarray

Abstract: The problem of estimating ARMA models is computationally interesting due to the nonconcavity of the log-likelihood function. Recent results were based on the convex minimization. Joint model selection using penalization by a convex norm, e.g. the nuclear norm of a certain matrix related to the state space formulation was extensively studied from a computational viewpoint. The goal of the present s… ▽ More The problem of estimating ARMA models is computationally interesting due to the nonconcavity of the log-likelihood function. Recent results were based on the convex minimization. Joint model selection using penalization by a convex norm, e.g. the nuclear norm of a certain matrix related to the state space formulation was extensively studied from a computational viewpoint. The goal of the present short note is to present a theoretical study of a nuclear norm penalization based variant of the method of \cite{Bauer:Automatica05,Bauer:EconTh05} under the assumption of a Gaussian noise process. △ Less

Submitted 7 August, 2015; originally announced August 2015.

arXiv:1506.02520 [pdf, ps, other]

Convex recovery of tensors using nuclear norm penalization

Authors: Stephane Chretien, Tianwen Wei

Abstract: The subdifferential of convex functions of the singular spectrum of real matrices has been widely studied in matrix analysis, optimization and automatic control theory. Convex analysis and optimization over spaces of tensors is now gaining much interest due to its potential applications to signal processing, statistics and engineering. The goal of this paper is to present an applications to the pr… ▽ More The subdifferential of convex functions of the singular spectrum of real matrices has been widely studied in matrix analysis, optimization and automatic control theory. Convex analysis and optimization over spaces of tensors is now gaining much interest due to its potential applications to signal processing, statistics and engineering. The goal of this paper is to present an applications to the problem of low rank tensor recovery based on linear random measurement by extending the results of Tropp to the tensors setting. △ Less

Submitted 8 June, 2015; originally announced June 2015.

Comments: To appear in proceedings LVA/ICA 2015 at Czech Republic

arXiv:1505.08049 [pdf, ps, other]

Sensing tensors with Gaussian filters

Authors: Stéphane Chrétien, Tianwen Wei

Abstract: Sparse recovery from linear Gaussian measurements has been the subject of much investigation since the breaktrough papers \cite{CRT:IEEEIT06} and \cite{donoho2006compressed} on Compressed Sensing. Application to sparse vectors and sparse matrices via least squares penalized with sparsity promoting norms is now well understood using tools such as Gaussian mean width, statistical dimension and the n… ▽ More Sparse recovery from linear Gaussian measurements has been the subject of much investigation since the breaktrough papers \cite{CRT:IEEEIT06} and \cite{donoho2006compressed} on Compressed Sensing. Application to sparse vectors and sparse matrices via least squares penalized with sparsity promoting norms is now well understood using tools such as Gaussian mean width, statistical dimension and the notion of descent cones \cite{tropp2014convex} \cite{Vershynin:ArXivEstimation14}. Extention of these ideas to low rank tensor recovery is starting to enjoy considerable interest due to its many potential applications to Independent Component Analysis, Hidden Markov Models and Gaussian Mixture Models \cite{AnandkumarEtAl:JMLR14}, hyperspectral image analysis \cite{zhang2008tensor}, to name a few. In this paper, we demonstrate that the recent approach of \cite{Vershynin:ArXivEstimation14} provides very useful error bounds in the tensor setting using the nuclear norm or the Romera-Paredes--Pontil \cite{RomeraParedesPontil:NIPS13} penalization. △ Less

Submitted 29 May, 2015; originally announced May 2015.

arXiv:1504.05004 [pdf, other]

Using the LASSO for gene selection in bladder cancer data

Authors: Stéphane Chrétien, Christophe Guyeux, Michael Boyer-Guittaut, Régis Delage-Mouroux, Françoise Descôtes

Abstract: Given a gene expression data array of a list of bladder cancer patients with their tumor states, it may be difficult to determine which genes can operate as disease markers when the array is large and possibly contains outliers and missing data. An additional difficulty is that observations (tumor states) in the regression problem are discrete ones. In this article, we solve these problems on conc… ▽ More Given a gene expression data array of a list of bladder cancer patients with their tumor states, it may be difficult to determine which genes can operate as disease markers when the array is large and possibly contains outliers and missing data. An additional difficulty is that observations (tumor states) in the regression problem are discrete ones. In this article, we solve these problems on concrete data using first a clustering approach, followed by Least Absolute Shrinkage and Selection Operator (LASSO) estimators in a nonlinear regression problem involving discrete variables, as described in the brand-new research work of Plan and Vershynin. Gene markers of the most severe tumor state are finally provided using the proposed approach. △ Less

Submitted 20 April, 2015; originally announced April 2015.

Comments: submitted to CIBB 2015

arXiv:1504.00865 [pdf, ps, other]

A lower bound on the expected optimal value of certain random linear programs and application to shortest paths and reliability

Authors: Stephane Chretien, Franck Corset

Abstract: The paper studies the expectation of the inspection time in complex aging systems. Under reasonable assumptions, this problem is reduced to studying the expectation of the length of the shortest path in the directed degradation graph of the systems where the parameters are given by a pool of experts. The expectation itself being sometimes out of reach, in closed form or even through Monte Carlo si… ▽ More The paper studies the expectation of the inspection time in complex aging systems. Under reasonable assumptions, this problem is reduced to studying the expectation of the length of the shortest path in the directed degradation graph of the systems where the parameters are given by a pool of experts. The expectation itself being sometimes out of reach, in closed form or even through Monte Carlo simulations in the case of large systems, we propose an easily computable lower bound. The proposed bound applies to a rather general class of linear programs with random nonnegative costs and is directly inspired from the upper bound of Dyer, Frieze and McDiarmid [Math.Programming {\bf 35} (1986), no.1,3--16]. △ Less

Submitted 15 February, 2016; v1 submitted 3 April, 2015; originally announced April 2015.

arXiv:1502.02523 [pdf, other]

A Bregman Proximal ADMM for NMF with Outliers: Estimating features with missing values and outliers: a Bregman-proximal point algorithm for robust Non-negative Matrix Factorization with application to gene expression analysis

Authors: Stéphane Chrétien, Christophe Guyeux, Bastien Conesa, Régis Delage-Mouroux, Michèle Jouvenot, Philippe Huetz, Françoise Descôtes

Abstract: To extract the relevant features in a given dataset is a difficult task, recently resolved in the non-negative data case with the Non-negative Matrix factorization (NMF) method. The objective of this research work is to extend this method to the case of missing and/or corrupted data due to outliers. To do so, data are denoised, missing values are imputed, and outliers are detected while performing… ▽ More To extract the relevant features in a given dataset is a difficult task, recently resolved in the non-negative data case with the Non-negative Matrix factorization (NMF) method. The objective of this research work is to extend this method to the case of missing and/or corrupted data due to outliers. To do so, data are denoised, missing values are imputed, and outliers are detected while performing a low-rank non-negative matrix factorization of the recovered matrix. To achieve this goal, a mixture of Bregman proximal methods and of the Augmented Lagrangian scheme are used, in a similar way to the so-called Alternating Direction of Multipliers method. An application to the analysis of gene expression data of patients with bladder cancer is finally proposed. △ Less

Submitted 9 February, 2015; originally announced February 2015.

arXiv:1502.01616 [pdf, ps, other]

Von Neumann's inequality for tensors

Authors: Stéphane Chrétien, Tianwen Wei

Abstract: For two matrices in $\mathbb R^{n_1\times n_2}$, the von Neumann inequality says that their scalar product is less than or equal to the scalar product of their singular spectrum. In this short note, we extend this result to real tensors and provide a complete study of the equality case. For two matrices in $\mathbb R^{n_1\times n_2}$, the von Neumann inequality says that their scalar product is less than or equal to the scalar product of their singular spectrum. In this short note, we extend this result to real tensors and provide a complete study of the equality case. △ Less

Submitted 5 February, 2015; originally announced February 2015.

arXiv:1406.5441 [pdf, ps, other]

Perturbation bounds on the extremal singular values of a matrix after appending a column

Authors: Stephane Chretien, Sebastien Darses

Abstract: In this paper, we study the perturbation of the extreme singular values of a matrix in the particular case where it is obtained after appending an arbitrary column vector. Such results have many applications in bifurcation theory, signal processing, control theory and many other fields. In the first part of this paper, we review and compare various bounds from recent research papers on this subjec… ▽ More In this paper, we study the perturbation of the extreme singular values of a matrix in the particular case where it is obtained after appending an arbitrary column vector. Such results have many applications in bifurcation theory, signal processing, control theory and many other fields. In the first part of this paper, we review and compare various bounds from recent research papers on this subject. We also present a new lower bound and a new upper bound on the perturbation of the operator norm is provided. Simple proofs are provided, based on the study of the characteristic polynomial rather than on variational methods, as e.g. in \cite{Li-Li}. In a second part of the paper, we present applications to signal processing and control theory. △ Less

Submitted 16 December, 2014; v1 submitted 20 June, 2014; originally announced June 2014.

arXiv:1402.6603 [pdf, ps, other]

On the spacings between the successive zeros of the Laguerre polynomials

Authors: Stephane Chretien, Sebastien Darses

Abstract: We propose a simple uniform lower bound on the spacings between the successive zeros of the Laguerre polynomials $L_n^{(α)}$ for all $α>-1$. Our bound is sharp regarding the order of dependency on $n$ and $α$ in various ranges. In particular, we recover the orders given in \cite{ahmed} for $α\in (-1,1]$. We propose a simple uniform lower bound on the spacings between the successive zeros of the Laguerre polynomials $L_n^{(α)}$ for all $α>-1$. Our bound is sharp regarding the order of dependency on $n$ and $α$ in various ranges. In particular, we recover the orders given in \cite{ahmed} for $α\in (-1,1]$. △ Less

Submitted 22 June, 2014; v1 submitted 19 February, 2014; originally announced February 2014.

Comments: This version proposes an improved bound and more comparisons with previous works

arXiv:1210.4762 [pdf, ps, other]

Mixture model for designs in high dimensional regression and the LASSO

Authors: Mohamed Ibrahim Assoweh, Emmanuel Caron, Stéphane Chrétien

Abstract: The LASSO is a recent technique for variable selection in the regression model \bean y & = & Xβ+ z, \eean where $X\in \R^{n\times p}$ and $z$ is a centered gaussian i.i.d. noise vector $\mathcal N(0,σ^2I)$. The LASSO has been proved to achieve remarkable properties such as exact support recovery of sparse vectors when the columns are sufficently incoherent and low prediction error under even less… ▽ More The LASSO is a recent technique for variable selection in the regression model \bean y & = & Xβ+ z, \eean where $X\in \R^{n\times p}$ and $z$ is a centered gaussian i.i.d. noise vector $\mathcal N(0,σ^2I)$. The LASSO has been proved to achieve remarkable properties such as exact support recovery of sparse vectors when the columns are sufficently incoherent and low prediction error under even less stringent conditions. However, many matrices do not satisfy small coherence in practical applications and the LASSO estimator may thus suffer from what is known as the slow rate regime. The goal of the present paper is to study the LASSO from a slightly different perspective by proposing a mixture model for the design matrix which is able to capture in a natural way the potentially clustered nature of the columns in many practical situations. In this model, the columns of the design matrix are drawn from a Gaussian mixture model. Instead of requiring incoherence for the design matrix $X$, we only require incoherence of the much smaller matrix of the mixture's centers. Our main result states that $Xβ$ can be estimated with the same precision as for incoherent designs except for a correction term depending on the maximal variance in the mixture model. △ Less

Submitted 19 December, 2023; v1 submitted 17 October, 2012; originally announced October 2012.

arXiv:1203.5223 [pdf, ps, other]

On prediction with the LASSO when the design is not incoherent

Authors: Stephane Chretien

Abstract: The LASSO estimator is an $\ell_1$-norm penalized least-squares estimator, which was introduced for variable selection in the linear model. When the design matrix satisfies, e.g. the Restricted Isometry Property, or has a small coherence index, the LASSO estimator has been proved to recover, with high probability, the support and sign pattern of sufficiently sparse regression vectors. Under simila… ▽ More The LASSO estimator is an $\ell_1$-norm penalized least-squares estimator, which was introduced for variable selection in the linear model. When the design matrix satisfies, e.g. the Restricted Isometry Property, or has a small coherence index, the LASSO estimator has been proved to recover, with high probability, the support and sign pattern of sufficiently sparse regression vectors. Under similar assumptions, the LASSO satisfies adaptive prediction bounds in various norms. The present note provides a prediction bound based on a new index for measuring how favorable is a design matrix for the LASSO estimator. We study the behavior of our new index for matrices with independent random columns uniformly drawn on the unit sphere. Using the simple trick of appending such a random matrix (with the right number of columns) to a given design matrix, we show that a prediction bound similar to \cite[Theorem 2.1]{CandesPlan:AnnStat09} holds without any constraint on the design matrix, other than restricted non-singularity. △ Less

Submitted 23 June, 2014; v1 submitted 23 March, 2012; originally announced March 2012.

Comments: typos corrected and some bounds improved. Still badly written, but in progress

arXiv:1201.5913 [pdf, ps, other]

A Component-wise EM Algorithm for Mixtures

Authors: Gilles Celeux, Stéphane Chrétien, Florence Forbes

Abstract: In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on finite mixture estimation. In this framework, we propose a component-wise EM, which updates the parameters sequentially. We give an interpretation of this procedure as a proximal point algorithm and use it to prove the con… ▽ More In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on finite mixture estimation. In this framework, we propose a component-wise EM, which updates the parameters sequentially. We give an interpretation of this procedure as a proximal point algorithm and use it to prove the convergence. Illustrative numerical experiments show how our algorithm compares to EM and a version of the SAGE algorithm. △ Less

Submitted 27 January, 2012; originally announced January 2012.

Journal ref: Journal of Computational and Graphical Statistics. (2001), 10 no.4 pp. 697--712

arXiv:1201.5912 [pdf, ps, other]

On EM algorithms and their proximal generalizations

Authors: Stéphane Chrétien, Alfred O. Hero

Abstract: In this paper, we analyze the celebrated EM algorithm from the point of view of proximal point algorithms. More precisely, we study a new type of generalization of the EM procedure introduced in \cite{Chretien&Hero:98} and called Kullback-proximal algorithms. The proximal framework allows us to prove new results concerning the cluster points. An essential contribution is a detailed analysis of the… ▽ More In this paper, we analyze the celebrated EM algorithm from the point of view of proximal point algorithms. More precisely, we study a new type of generalization of the EM procedure introduced in \cite{Chretien&Hero:98} and called Kullback-proximal algorithms. The proximal framework allows us to prove new results concerning the cluster points. An essential contribution is a detailed analysis of the case where some cluster points lie on the boundary of the parameter space. △ Less

Submitted 27 January, 2012; originally announced January 2012.

Journal ref: ESAIM: Probability and Statistics (2008) 12 pp. 308--326

arXiv:1201.5907 [pdf, ps, other]

Kullback Proximal Algorithms for Maximum Likelihood Estimation

Authors: Stéphane Chrétien, Alfred O. Hero

Abstract: Accelerated algorithms for maximum likelihood image reconstruction are essential for emerging applications such as 3D tomography, dynamic tomographic imaging, and other high dimensional inverse problems. In this paper, we introduce and analyze a class of fast and stable sequential optimization methods for computing maximum likelihood estimates and study its convergence properties. These methods ar… ▽ More Accelerated algorithms for maximum likelihood image reconstruction are essential for emerging applications such as 3D tomography, dynamic tomographic imaging, and other high dimensional inverse problems. In this paper, we introduce and analyze a class of fast and stable sequential optimization methods for computing maximum likelihood estimates and study its convergence properties. These methods are based on a {\it proximal point algorithm} implemented with the Kullback-Liebler (KL) divergence between posterior densities of the complete data as a proximal penalty function. When the proximal relaxation parameter is set to unity one obtains the classical expectation maximization (EM) algorithm. For a decreasing sequence of relaxation parameters, relaxed versions of EM are obtained which can have much faster asymptotic convergence without sacrifice of monotonicity. We present an implementation of the algorithm using Moré's {\it Trust Region} update strategy. For illustration the method is applied to a non-quadratic inverse problem with Poisson distributed data. △ Less

Submitted 27 January, 2012; originally announced January 2012.

Comments: 6 figures

Journal ref: IEEE Transactions on Information Theory, (2000) 46 no.5, pp. 1800--1810

arXiv:1105.1430 [pdf, ps, other]

On the generic uniform uniqueness of the LASSO estimator

Authors: Stephane Chretien, Sebastien Darses

Abstract: The LASSO is a variable subset selection procedure in statistical linear regression based on $\ell_1$ penalization of the least-squares operator. Uniqueness of the LASSO is an important issue, especially for the study of the LASSO path. The goal of the present paper is to provide a generic sufficient condition on the design matrix for the LASSO minimizer to be unique. Unlike previous works on the… ▽ More The LASSO is a variable subset selection procedure in statistical linear regression based on $\ell_1$ penalization of the least-squares operator. Uniqueness of the LASSO is an important issue, especially for the study of the LASSO path. The goal of the present paper is to provide a generic sufficient condition on the design matrix for the LASSO minimizer to be unique. Unlike previous works on the question of uniqueness, our condition only depends on the design matrix. Our study is based on a general position condition on the design matrix which holds with probability one for most experimental models. △ Less

Submitted 1 March, 2016; v1 submitted 7 May, 2011; originally announced May 2011.

arXiv:1103.3063 [pdf, ps, other]

Invertibility of random submatrices via tail decoupling and a Matrix Chernoff Inequality

Authors: Stéphane Chrétien, Sébastien Darses

Abstract: Let $X$ be a $n\times p$ matrix with coherence $μ(X)=\max_{j\neq j'} |X_j^tX_{j'}|$. We present a simplified and improved study of the quasi-isometry property for most submatrices of $X$ obtained by uniform column sampling. Our results depend on $μ(X)$, $\|X\|$ and the dimensions with explicit constants, which improve the previously known values by a large factor. The analysis relies on a tail dec… ▽ More Let $X$ be a $n\times p$ matrix with coherence $μ(X)=\max_{j\neq j'} |X_j^tX_{j'}|$. We present a simplified and improved study of the quasi-isometry property for most submatrices of $X$ obtained by uniform column sampling. Our results depend on $μ(X)$, $\|X\|$ and the dimensions with explicit constants, which improve the previously known values by a large factor. The analysis relies on a tail decoupling argument, of independent interest, and a recent version of the Non-Commutative Chernoff inequality (NCCI). △ Less

Submitted 19 March, 2012; v1 submitted 15 March, 2011; originally announced March 2011.

arXiv:1101.5475 [pdf, other]

Multivariate GARCH estimation via a Bregman-proximal trust-region method

Authors: Stéphane Chrétien, Juan-Pablo Ortega

Abstract: The estimation of multivariate GARCH time series models is a difficult task mainly due to the significant overparameterization exhibited by the problem and usually referred to as the "curse of dimensionality". For example, in the case of the VEC family, the number of parameters involved in the model grows as a polynomial of order four on the dimensionality of the problem. Moreover, these parameter… ▽ More The estimation of multivariate GARCH time series models is a difficult task mainly due to the significant overparameterization exhibited by the problem and usually referred to as the "curse of dimensionality". For example, in the case of the VEC family, the number of parameters involved in the model grows as a polynomial of order four on the dimensionality of the problem. Moreover, these parameters are subjected to convoluted nonlinear constraints necessary to ensure, for instance, the existence of stationary solutions and the positive semidefinite character of the conditional covariance matrices used in the model design. So far, this problem has been addressed in the literature only in low dimensional cases with strong parsimony constraints. In this paper we propose a general formulation of the estimation problem in any dimension and develop a Bregman-proximal trust-region method for its solution. The Bregman-proximal approach allows us to handle the constraints in a very efficient and natural way by staying in the primal space and the Trust-Region mechanism stabilizes and speeds up the scheme. Preliminary computational experiments are presented and confirm the very good performances of the proposed approach. △ Less

Submitted 28 January, 2011; originally announced January 2011.

Comments: 35 pages, 5 figures

MSC Class: 91G70; 65C60

arXiv:1101.0434 [pdf, ps, other]

Sparse recovery with unknown variance: a LASSO-type approach

Authors: Stéphane Chrétien, Sébastien Darses

Abstract: We address the issue of estimating the regression vector $β$ in the generic $s$-sparse linear model $y = Xβ+z$, with $β\in\R^{p}$, $y\in\R^{n}$, $z\sim\mathcal N(0,\sg^2 I)$ and $p> n$ when the variance $\sg^{2}$ is unknown. We study two LASSO-type methods that jointly estimate $β$ and the variance. These estimators are minimizers of the $\ell_1$ penalized least-squares functional, where the relax… ▽ More We address the issue of estimating the regression vector $β$ in the generic $s$-sparse linear model $y = Xβ+z$, with $β\in\R^{p}$, $y\in\R^{n}$, $z\sim\mathcal N(0,\sg^2 I)$ and $p> n$ when the variance $\sg^{2}$ is unknown. We study two LASSO-type methods that jointly estimate $β$ and the variance. These estimators are minimizers of the $\ell_1$ penalized least-squares functional, where the relaxation parameter is tuned according to two different strategies. In the first strategy, the relaxation parameter is of the order $\chσ \sqrt{\log p}$, where $\chσ^2$ is the empirical variance. %The resulting optimization problem can be solved by running only a few successive LASSO instances with %recursive updating of the relaxation parameter. In the second strategy, the relaxation parameter is chosen so as to enforce a trade-off between the fidelity and the penalty terms at optimality. For both estimators, our assumptions are similar to the ones proposed by Candès and Plan in {\it Ann. Stat. (2009)}, for the case where $\sg^{2}$ is known. We prove that our estimators ensure exact recovery of the support and sign pattern of $β$ with high probability. We present simulations results showing that the first estimator enjoys nearly the same performances in practice as the standard LASSO (known variance case) for a wide range of the signal to noise ratio. Our second estimator is shown to outperform both in terms of false detection, when the signal to noise ratio is low. △ Less

Submitted 5 November, 2012; v1 submitted 2 January, 2011; originally announced January 2011.

arXiv:0906.0593

On the modified Basis Pursuit reconstruction for Compressed Sensing with partially known support

Authors: Stephane Chretien

Abstract: The goal of this short note is to present a refined analysis of the modified Basis Pursuit ($\ell_1$-minimization) approach to signal recovery in Compressed Sensing with partially known support, as introduced by Vaswani and Lu. The problem is to recover a signal $x \in \mathbb R^p$ using an observation vector $y=Ax$, where $A \in \mathbb R^{n\times p}$ and in the highly underdetermined setting… ▽ More The goal of this short note is to present a refined analysis of the modified Basis Pursuit ($\ell_1$-minimization) approach to signal recovery in Compressed Sensing with partially known support, as introduced by Vaswani and Lu. The problem is to recover a signal $x \in \mathbb R^p$ using an observation vector $y=Ax$, where $A \in \mathbb R^{n\times p}$ and in the highly underdetermined setting $n\ll p$. Based on an initial and possibly erroneous guess $T$ of the signal's support ${\rm supp}(x)$, the Modified Basis Pursuit method of Vaswani and Lu consists of minimizing the $\ell_1$ norm of the estimate over the indices indexed by $T^c$ only. We prove exact recovery essentially under a Restricted Isometry Property assumption of order 2 times the cardinal of $T^c \cap {\rm supp}(x)$, i.e. the number of missed components. △ Less

Submitted 4 September, 2015; v1 submitted 2 June, 2009; originally announced June 2009.

Comments: Withdrawn due to an error in the proof. A new version will be submitted as a section in a future paper

arXiv:0902.1324 [pdf, ps, other]

Using the Eigenvalue Relaxation for Binary Least-Squares Estimation Problems

Authors: Stephane Chretien, Franck Corset

Abstract: The goal of this paper is to survey the properties of the eigenvalue relaxation for least squares binary problems. This relaxation is a convex program which is obtained as the Lagrangian dual of the original problem with an implicit compact constraint and as such, is a convex problem with polynomial time complexity. Moreover, as a main pratical advantage of this relaxation over the standard Semi… ▽ More The goal of this paper is to survey the properties of the eigenvalue relaxation for least squares binary problems. This relaxation is a convex program which is obtained as the Lagrangian dual of the original problem with an implicit compact constraint and as such, is a convex problem with polynomial time complexity. Moreover, as a main pratical advantage of this relaxation over the standard Semi-Definite Programming approach, several efficient bundle methods are available for this problem allowing to address problems of very large dimension. The necessary tools from convex analysis are recalled and shown at work for handling the problem of exactness of this relaxation. Two applications are described. The first one is the problem of binary image reconstruction and the second is the problem of multiuser detection in CDMA systems. △ Less

Submitted 9 February, 2009; v1 submitted 8 February, 2009; originally announced February 2009.

arXiv:0901.4752 [pdf, other]

Estimation of Gaussian mixtures in small sample studies using $l_1$ penalization

Authors: Stephane Chretien

Abstract: Many experiments in medicine and ecology can be conveniently modeled by finite Gaussian mixtures but face the problem of dealing with small data sets. We propose a robust version of the estimator based on self-regression and sparsity promoting penalization in order to estimate the components of Gaussian mixtures in such contexts. A space alternating version of the penalized EM algorithm is obtaine… ▽ More Many experiments in medicine and ecology can be conveniently modeled by finite Gaussian mixtures but face the problem of dealing with small data sets. We propose a robust version of the estimator based on self-regression and sparsity promoting penalization in order to estimate the components of Gaussian mixtures in such contexts. A space alternating version of the penalized EM algorithm is obtained and we prove that its cluster points satisfy the Karush-Kuhn-Tucker conditions. Monte Carlo experiments are presented in order to compare the results obtained by our method and by standard maximum likelihood estimation. In particular, our estimator is seen to perform better than the maximum likelihood estimator. △ Less

Submitted 8 October, 2014; v1 submitted 29 January, 2009; originally announced January 2009.

arXiv:0901.0017 [pdf, ps, other]

Space Alternating Penalized Kullback Proximal Point Algorithms for Maximizing Likelihood with Nondifferentiable Penalty

Authors: Stéphane Chrétien, Alfred Hero, Hervé Perdry

Abstract: The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. The goal of this paper is to introduce a new class of Space Alternat… ▽ More The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. The goal of this paper is to introduce a new class of Space Alternating Penalized Kullback Proximal extensions of the EM algorithm for nonsmooth likelihood inference. We show that the cluster points of the new method are stationary points even when on the boundary of the parameter set. Special attention has been paid to the construction of component-wise version of the method in order to ease the implementation for complicated models. Illustration for the problems of model selection for finite mixtures of regression and to sparse image reconstruction is presented. △ Less

Submitted 1 June, 2011; v1 submitted 30 December, 2008; originally announced January 2009.

Comments: 3 figures

Showing 1–50 of 51 results for author: Chretien, S