Search | arXiv e-print repository

Convergence of alternating minimisation algorithms for dictionary learning

Abstract: In this paper we derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning - the Method of Optimal Directions (MOD) and Online Dictionary Learning (ODL), which can also be thought of as approximative K-SVD. We show that given a well-behaved initialisation that is either within distance at most $1/\log(K)$ to the generating dictionar… ▽ More In this paper we derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning - the Method of Optimal Directions (MOD) and Online Dictionary Learning (ODL), which can also be thought of as approximative K-SVD. We show that given a well-behaved initialisation that is either within distance at most $1/\log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation only points to one generating element, both algorithms will converge with geometric convergence rate to the generating dictionary. This is done even for data models with non-uniform distributions on the supports of the sparse coefficients. These allow the appearance frequency of the dictionary elements to vary heavily and thus model real data more closely. △ Less

Submitted 26 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2212.09391 [pdf, ps, other]

Non-asymptotic bounds for inclusion probabilities in rejective sampling

Authors: Simon Ruetz, Karin Schnass

Abstract: We provide non-asymptotic bounds for first and higher order inclusion probabilities of the rejective sampling model with various size parameters. Further we derive bounds in the semi-definite ordering for matrices that collect (conditional) first and second order inclusion probabilities as their diagonal resp. off-diagonal entries. We provide non-asymptotic bounds for first and higher order inclusion probabilities of the rejective sampling model with various size parameters. Further we derive bounds in the semi-definite ordering for matrices that collect (conditional) first and second order inclusion probabilities as their diagonal resp. off-diagonal entries. △ Less

Submitted 19 December, 2022; originally announced December 2022.

MSC Class: 60C05; 62D05; ACM Class: G.3

arXiv:2012.02082 [pdf, other]

Submatrices with non-uniformly selected random supports and insights into sparse approximation

Authors: Simon Ruetz, Karin Schnass

Abstract: In this paper we derive tail bounds on the norms of random submatrices with non-uniformly distributed supports. We apply these results to sparse approximation and conduct an analysis of the average case performance of thresholding, Orthogonal Matching Pursuit and Basis Pursuit. As an application of these results we characterise sensing dictionaries to improve average performance in the non-uniform… ▽ More In this paper we derive tail bounds on the norms of random submatrices with non-uniformly distributed supports. We apply these results to sparse approximation and conduct an analysis of the average case performance of thresholding, Orthogonal Matching Pursuit and Basis Pursuit. As an application of these results we characterise sensing dictionaries to improve average performance in the non-uniform case and test their performance numerically. △ Less

Submitted 3 December, 2020; originally announced December 2020.

arXiv:1809.06684 [pdf, other]

doi 10.1109/LSP.2018.2878061

Average performance of Orthogonal Matching Pursuit (OMP) for sparse approximation

Authors: Karin Schnass

Abstract: We present a theoretical analysis of the average performance of OMP for sparse approximation. For signals that are generated from a dictionary with $K$ atoms and coherence $μ$ and coefficients corresponding to a geometric sequence with parameter $α<1$, we show that OMP is successful with high probability as long as the sparsity level $S$ scales as $Sμ^2 \log K \lesssim 1-α$. This improves by an or… ▽ More We present a theoretical analysis of the average performance of OMP for sparse approximation. For signals that are generated from a dictionary with $K$ atoms and coherence $μ$ and coefficients corresponding to a geometric sequence with parameter $α<1$, we show that OMP is successful with high probability as long as the sparsity level $S$ scales as $Sμ^2 \log K \lesssim 1-α$. This improves by an order of magnitude over worst case results and shows that OMP and its famous competitor Basis Pursuit outperform each other depending on the setting. △ Less

Submitted 15 July, 2019; v1 submitted 18 September, 2018; originally announced September 2018.

Comments: 12 pages, 2 figures, extended and corrected version of the published version

arXiv:1805.00692 [pdf, ps, other]

Compressed Dictionary Learning

Authors: Karin Schnass, Flavio Teixeira

Abstract: In this paper we show that the computational complexity of the Iterative Thresholding and K-residual-Means (ITKrM) algorithm for dictionary learning can be significantly reduced by using dimensionality-reduction techniques based on the Johnson-Lindenstrauss lemma. The dimensionality reduction is efficiently carried out with the fast Fourier transform. We introduce the Iterative compressed-Threshol… ▽ More In this paper we show that the computational complexity of the Iterative Thresholding and K-residual-Means (ITKrM) algorithm for dictionary learning can be significantly reduced by using dimensionality-reduction techniques based on the Johnson-Lindenstrauss lemma. The dimensionality reduction is efficiently carried out with the fast Fourier transform. We introduce the Iterative compressed-Thresholding and K-Means (IcTKM) algorithm for fast dictionary learning and study its convergence properties. We show that IcTKM can locally recover an incoherent, overcomplete generating dictionary of $K$ atoms from training signals of sparsity level $S$ with high probability. Fast dictionary learning is achieved by embedding the training data and the dictionary into $m < d$ dimensions, and recovery is shown to be locally stable with an embedding dimension which scales as low as $m = O(S \log^4 S \log^3 K)$. The compression effectively shatters the data dimension bottleneck in the computational cost of ITKrM, reducing it by a factor $O(m/d)$. Our theoretical results are complemented with numerical simulations which demonstrate that IcTKM is a powerful, low-cost algorithm for learning dictionaries from high-dimensional data sets. △ Less

Submitted 22 February, 2020; v1 submitted 2 May, 2018; originally announced May 2018.

Comments: 5 figure, 4.6 pages per figure

arXiv:1804.07101 [pdf, other]

Dictionary learning -- from local towards global and adaptive

Authors: Marie Christine Pali, Karin Schnass

Abstract: This paper studies the convergence behaviour of dictionary learning via the Iterative Thresholding and K-residual Means (ITKrM) algorithm. On one hand it is proved that ITKrM is a contraction under much more relaxed conditions than previously necessary. On the other hand it is shown that there seem to exist stable fixed points that do not correspond to the generating dictionary, which can be chara… ▽ More This paper studies the convergence behaviour of dictionary learning via the Iterative Thresholding and K-residual Means (ITKrM) algorithm. On one hand it is proved that ITKrM is a contraction under much more relaxed conditions than previously necessary. On the other hand it is shown that there seem to exist stable fixed points that do not correspond to the generating dictionary, which can be characterised as very coherent. Based on an analysis of the residuals using these bad dictionaries, replacing coherent atoms with carefully designed replacement candidates is proposed. In experiments on synthetic data this outperforms random or no replacement and always leads to full dictionary recovery. Finally the question how to learn dictionaries without knowledge of the correct dictionary size and sparsity level is addressed. Decoupling the replacement strategy of coherent or unused atoms into pruning and adding, and slowly carefully increasing the sparsity level, leads to an adaptive version of ITKrM. In several experiments this adaptive dictionary learning algorithm is shown to recover a generating dictionary from randomly initialised dictionaries of various sizes on synthetic data and to learn meaningful dictionaries on image data. △ Less

Submitted 21 April, 2021; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: 11 figures, 5 pages per figure including pseudocode

arXiv:1704.00227 [pdf, other]

Online and Stable Learning of Analysis Operators

Authors: Michael Sandbichler, Karin Schnass

Abstract: In this paper four iterative algorithms for learning analysis operators are presented. They are built upon the same optimisation principle underlying both Analysis K-SVD and Analysis SimCO. The Forward and Sequential Analysis Operator Learning (FAOL and SAOL) algorithms are based on projected gradient descent with optimally chosen step size. The Implicit AOL (IAOL) algorithm is inspired by the imp… ▽ More In this paper four iterative algorithms for learning analysis operators are presented. They are built upon the same optimisation principle underlying both Analysis K-SVD and Analysis SimCO. The Forward and Sequential Analysis Operator Learning (FAOL and SAOL) algorithms are based on projected gradient descent with optimally chosen step size. The Implicit AOL (IAOL) algorithm is inspired by the implicit Euler scheme for solving ordinary differential equations and does not require to choose a step size. The fourth algorithm, Singular Value AOL (SVAOL), uses a similar strategy as Analysis K-SVD while avoiding its high computational cost. All algorithms are proven to decrease or preserve the target function in each step and a characterisation of their stationary points is provided. Further they are tested on synthetic and image data, compared to Analysis SimCO and found to give better recovery rates and faster decay of the objective function respectively. In a final denoising experiment the presented algorithms are again shown to perform similar to or better than the state-of-the-art algorithm ASimCO. △ Less

Submitted 1 February, 2018; v1 submitted 1 April, 2017; originally announced April 2017.

Comments: 21 pages, 12 figures, 6 tables

arXiv:1701.03655 [pdf, other]

doi 10.1186/s13634-018-0533-0

Dictionary Learning from Incomplete Data

Authors: Valeriya Naumova, Karin Schnass

Abstract: This paper extends the recently proposed and theoretically justified iterative thresholding and $K$ residual means algorithm ITKrM to learning dicionaries from incomplete/masked training data (ITKrMM). It further adapts the algorithm to the presence of a low rank component in the data and provides a strategy for recovering this low rank component again from incomplete data. Several synthetic exper… ▽ More This paper extends the recently proposed and theoretically justified iterative thresholding and $K$ residual means algorithm ITKrM to learning dicionaries from incomplete/masked training data (ITKrMM). It further adapts the algorithm to the presence of a low rank component in the data and provides a strategy for recovering this low rank component again from incomplete data. Several synthetic experiments show the advantages of incorporating information about the corruption into the algorithm. Finally, image inpainting is considered as application example, which demonstrates the superior performance of ITKrMM in terms of speed at similar or better reconstruction quality compared to its closest dictionary learning counterpart. △ Less

Submitted 19 January, 2017; v1 submitted 13 January, 2017; originally announced January 2017.

Comments: 22 pages, 9 figures, (this version with bug fix for wksvd)

arXiv:1503.07027 [pdf, other]

Convergence radius and sample complexity of ITKM algorithms for dictionary learning

Authors: Karin Schnass

Abstract: In this work we show that iterative thresholding and K-means (ITKM) algorithms can recover a generating dictionary with K atoms from noisy $S$ sparse signals up to an error $\tilde \varepsilon$ as long as the initialisation is within a convergence radius, that is up to a $\log K$ factor inversely proportional to the dynamic range of the signals, and the sample size is proportional to… ▽ More In this work we show that iterative thresholding and K-means (ITKM) algorithms can recover a generating dictionary with K atoms from noisy $S$ sparse signals up to an error $\tilde \varepsilon$ as long as the initialisation is within a convergence radius, that is up to a $\log K$ factor inversely proportional to the dynamic range of the signals, and the sample size is proportional to $K \log K \tilde \varepsilon^{-2}$. The results are valid for arbitrary target errors if the sparsity level is of the order of the square root of the signal dimension $d$ and for target errors down to $K^{-\ell}$ if $S$ scales as $S \leq d/(\ell \log K)$. △ Less

Submitted 8 August, 2016; v1 submitted 24 March, 2015; originally announced March 2015.

Comments: 34 pages, 2 figures

arXiv:1401.6354 [pdf, other]

Local Identification of Overcomplete Dictionaries

Authors: Karin Schnass

Abstract: This paper presents the first theoretical results showing that stable identification of overcomplete $μ$-coherent dictionaries $Φ\in \mathbb{R}^{d\times K}$ is locally possible from training signals with sparsity levels $S$ up to the order $O(μ^{-2})$ and signal to noise ratios up to $O(\sqrt{d})$. In particular the dictionary is recoverable as the local maximum of a new maximisation criterion tha… ▽ More This paper presents the first theoretical results showing that stable identification of overcomplete $μ$-coherent dictionaries $Φ\in \mathbb{R}^{d\times K}$ is locally possible from training signals with sparsity levels $S$ up to the order $O(μ^{-2})$ and signal to noise ratios up to $O(\sqrt{d})$. In particular the dictionary is recoverable as the local maximum of a new maximisation criterion that generalises the K-means criterion. For this maximisation criterion results for asymptotic exact recovery for sparsity levels up to $O(μ^{-1})$ and stable recovery for sparsity levels up to $O(μ^{-2})$ as well as signal to noise ratios up to $O(\sqrt{d})$ are provided. These asymptotic results translate to finite sample size recovery results with high probability as long as the sample size $N$ scales as $O(K^3dS \tilde \varepsilon^{-2})$, where the recovery precision $\tilde \varepsilon$ can go down to the asymptotically achievable precision. Further, to actually find the local maxima of the new criterion, a very simple Iterative Thresholding and K (signed) Means algorithm (ITKM), which has complexity $O(dKN)$ in each iteration, is presented and its local efficiency is demonstrated in several experiments. △ Less

Submitted 2 April, 2015; v1 submitted 24 January, 2014; originally announced January 2014.

Comments: 32 pages, 2 figures, final version accepted to JMLR

arXiv:1301.3375 [pdf, other]

doi 10.1016/j.acha.2014.01.005

On the Identifiability of Overcomplete Dictionaries via the Minimisation Principle Underlying K-SVD

Authors: Karin Schnass

Abstract: This article gives theoretical insights into the performance of K-SVD, a dictionary learning algorithm that has gained significant popularity in practical applications. The particular question studied here is when a dictionary $Φ\in \mathbb{R}^{d \times K}$ can be recovered as local minimum of the minimisation criterion underlying K-SVD from a set of $N$ training signals $y_n =Φx_n$. A theoretical… ▽ More This article gives theoretical insights into the performance of K-SVD, a dictionary learning algorithm that has gained significant popularity in practical applications. The particular question studied here is when a dictionary $Φ\in \mathbb{R}^{d \times K}$ can be recovered as local minimum of the minimisation criterion underlying K-SVD from a set of $N$ training signals $y_n =Φx_n$. A theoretical analysis of the problem leads to two types of identifiability results assuming the training signals are generated from a tight frame with coefficients drawn from a random symmetric distribution. First, asymptotic results showing, that in expectation the generating dictionary can be recovered exactly as a local minimum of the K-SVD criterion if the coefficient distribution exhibits sufficient decay. Second, based on the asymptotic results it is demonstrated that given a finite number of training samples $N$, such that $N/\log N = O(K^3d)$, except with probability $O(N^{-Kd})$ there is a local minimum of the K-SVD criterion within distance $O(KN^{-1/4})$ to the generating dictionary. △ Less

Submitted 2 April, 2015; v1 submitted 15 January, 2013; originally announced January 2013.

Comments: 36 pages (double spaced), 3 figures, equivalent to final accepted version

Journal ref: Applied and Computational Harmonic Analysis, Volume 37, Issue 3, November 2014, Pages 464-491

arXiv:1008.3043 [pdf, ps, other]

Learning Functions of Few Arbitrary Linear Parameters in High Dimensions

Authors: Massimo Fornasier, Karin Schnass, Jan Vybiral

Abstract: Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Und… ▽ More Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function $g$, and an {\it arbitrary} choice of the matrix $A$, we present in this paper 1. a sampling choice of the points $\{x_i\}$ drawn at random for each function approximation; 2. algorithms (Algorithm 1 and Algorithm 2) for computing the approximating function, whose complexity is at most polynomial in the dimension $d$ and in the number $m$ of points. Due to the arbitrariness of $A$, the choice of the sampling points will be according to suitable random distributions and our results hold with overwhelming probability. Our approach uses tools taken from the {\it compressed sensing} framework, recent Chernoff bounds for sums of positive-semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions. △ Less

Submitted 17 January, 2012; v1 submitted 18 August, 2010; originally announced August 2010.

Comments: 31 pages, this version was accepted to Foundations of Computational Mathematics, the final publication will be available on http://www.springerlink.com

arXiv:1005.1471 [pdf, other]

Classification via Incoherent Subspaces

Authors: Karin Schnass, Pierre Vandergheynst

Abstract: This article presents a new classification framework that can extract individual features per class. The scheme is based on a model of incoherent subspaces, each one associated to one class, and a model on how the elements in a class are represented in this subspace. After the theoretical analysis an alternate projection algorithm to find such a collection is developed. The classification performa… ▽ More This article presents a new classification framework that can extract individual features per class. The scheme is based on a model of incoherent subspaces, each one associated to one class, and a model on how the elements in a class are represented in this subspace. After the theoretical analysis an alternate projection algorithm to find such a collection is developed. The classification performance and speed of the proposed method is tested on the AR and YaleB databases and compared to that of Fisher's LDA and a recent approach based on on $\ell_1$ minimisation. Finally connections of the presented scheme to already existing work are discussed and possible ways of extensions are pointed out. △ Less

Submitted 10 May, 2010; originally announced May 2010.

Comments: 22 pages, 2 figures, 4 tables

arXiv:0904.4774 [pdf, other]

Dictionary Identification - Sparse Matrix-Factorisation via $\ell_1$-Minimisation

Authors: Remi Gribonval, Karin Schnass

Abstract: This article treats the problem of learning a dictionary providing sparse representations for a given signal class, via $\ell_1$-minimisation. The problem can also be seen as factorising a $\ddim \times \nsig$ matrix $Y=(y_1 >... y_\nsig), y_n\in \R^\ddim$ of training signals into a $\ddim \times \natoms$ dictionary matrix $\dico$ and a $\natoms \times \nsig$ coefficient matrix… ▽ More This article treats the problem of learning a dictionary providing sparse representations for a given signal class, via $\ell_1$-minimisation. The problem can also be seen as factorising a $\ddim \times \nsig$ matrix $Y=(y_1 >... y_\nsig), y_n\in \R^\ddim$ of training signals into a $\ddim \times \natoms$ dictionary matrix $\dico$ and a $\natoms \times \nsig$ coefficient matrix $\X=(x_1... x_\nsig), x_n \in \R^\natoms$, which is sparse. The exact question studied here is when a dictionary coefficient pair $(\dico,\X)$ can be recovered as local minimum of a (nonconvex) $\ell_1$-criterion with input $Y=\dico \X$. First, for general dictionaries and coefficient matrices, algebraic conditions ensuring local identifiability are derived, which are then specialised to the case when the dictionary is a basis. Finally, assuming a random Bernoulli-Gaussian sparse model on the coefficient matrix, it is shown that sufficiently incoherent bases are locally identifiable with high probability. The perhaps surprising result is that the typically sufficient number of training samples $\nsig$ grows up to a logarithmic factor only linearly with the signal dimension, i.e. $\nsig \approx C \natoms \log \natoms$, in contrast to previous approaches requiring combinatorially many samples. △ Less

Submitted 1 March, 2010; v1 submitted 30 April, 2009; originally announced April 2009.

Comments: 32 pages (IEEE draft format), 8 figures, submitted to IEEE Trans. Inf. Theory

arXiv:math/0701131 [pdf, ps, other]

doi 10.1109/TIT.2008.920190

Compressed Sensing and Redundant Dictionaries

Authors: Holger Rauhut, Karin Schnass, Pierre Vandergheynst

Abstract: This article extends the concept of compressed sensing to signals that are not sparse in an orthonormal basis but rather in a redundant dictionary. It is shown that a matrix, which is a composition of a random matrix of certain type and a deterministic dictionary, has small restricted isometry constants. Thus, signals that are sparse with respect to the dictionary can be recovered via Basis Pursui… ▽ More This article extends the concept of compressed sensing to signals that are not sparse in an orthonormal basis but rather in a redundant dictionary. It is shown that a matrix, which is a composition of a random matrix of certain type and a deterministic dictionary, has small restricted isometry constants. Thus, signals that are sparse with respect to the dictionary can be recovered via Basis Pursuit from a small number of random measurements. Further, thresholding is investigated as recovery algorithm for compressed sensing and conditions are provided that guarantee reconstruction with high probability. The different schemes are compared by numerical experiments. △ Less

Submitted 9 November, 2010; v1 submitted 4 January, 2007; originally announced January 2007.

Comments: error in a constant corrected

MSC Class: 15A52; 68P30; 68W25

Journal ref: IEEE Trans. Inform. Theory, 54(5):2210-2219, 2008

Showing 1–15 of 15 results for author: Schnass, K