-
Metric-Entropy Limits on Nonlinear Dynamical System Learning
Authors:
Yang Pan,
Clemens Hutter,
Helmut Bölcskei
Abstract:
This paper is concerned with the fundamental limits of nonlinear dynamical system learning from input-output traces. Specifically, we show that recurrent neural networks (RNNs) are capable of learning nonlinear systems that satisfy a Lipschitz property and forget past inputs fast enough in a metric-entropy optimal manner. As the sets of sequence-to-sequence maps realized by the dynamical systems w…
▽ More
This paper is concerned with the fundamental limits of nonlinear dynamical system learning from input-output traces. Specifically, we show that recurrent neural networks (RNNs) are capable of learning nonlinear systems that satisfy a Lipschitz property and forget past inputs fast enough in a metric-entropy optimal manner. As the sets of sequence-to-sequence maps realized by the dynamical systems we consider are significantly more massive than function classes generally considered in deep neural network approximation theory, a refined metric-entropy characterization is needed, namely in terms of order, type, and generalized dimension. We compute these quantities for the classes of exponentially-decaying and polynomially-decaying Lipschitz fading-memory systems and show that RNNs can achieve them.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Entropy of Compact Operators with Applications to Landau-Pollak-Slepian Theory and Sobolev Spaces
Authors:
Thomas Allard,
Helmut Bölcskei
Abstract:
We derive a precise general relation between the entropy of a compact operator and its eigenvalues. It is then shown how this result along with the underlying philosophy can be applied to improve substantially on the best known characterizations of the entropy of the Landau-Pollak-Slepian operator and the metric entropy of unit balls in Sobolev spaces.
We derive a precise general relation between the entropy of a compact operator and its eigenvalues. It is then shown how this result along with the underlying philosophy can be applied to improve substantially on the best known characterizations of the entropy of the Landau-Pollak-Slepian operator and the metric entropy of unit balls in Sobolev spaces.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Ellipsoid Methods for Metric Entropy Computation
Authors:
Thomas Allard,
Helmut Bölcskei
Abstract:
We present a new methodology for the characterization of the metric entropy of infinite-dimensional ellipsoids with exponentially decaying semi-axes. This procedure does not rely on the explicit construction of coverings or packings and provides a unified framework for the derivation of the metric entropy of a wide variety of analytic function classes, such as periodic functions analytic on a stri…
▽ More
We present a new methodology for the characterization of the metric entropy of infinite-dimensional ellipsoids with exponentially decaying semi-axes. This procedure does not rely on the explicit construction of coverings or packings and provides a unified framework for the derivation of the metric entropy of a wide variety of analytic function classes, such as periodic functions analytic on a strip, analytic functions bounded on a disk, and functions of exponential type. In each of these cases, our results improve upon the best known results in the literature.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Metric entropy of causal, discrete-time LTI systems
Authors:
Clemens Hutter,
Thomas Allard,
Helmut Bölcskei
Abstract:
In [1] it is shown that recurrent neural networks (RNNs) can learn - in a metric entropy optimal manner - discrete time, linear time-invariant (LTI) systems. This is effected by comparing the number of bits needed to encode the approximating RNN to the metric entropy of the class of LTI systems under consideration [2, 3]. The purpose of this note is to provide an elementary self-contained proof of…
▽ More
In [1] it is shown that recurrent neural networks (RNNs) can learn - in a metric entropy optimal manner - discrete time, linear time-invariant (LTI) systems. This is effected by comparing the number of bits needed to encode the approximating RNN to the metric entropy of the class of LTI systems under consideration [2, 3]. The purpose of this note is to provide an elementary self-contained proof of the metric entropy results in [2, 3], in the process of which minor mathematical issues appearing in [2, 3] are cleaned up. These corrections also lead to the correction of a constant in a result in [1] (see Remark 2.5).
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Lossy Compression of General Random Variables
Authors:
Erwin Riegler,
Helmut Bölcskei,
Günther Koliander
Abstract:
This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fract…
▽ More
This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in i) manifolds, namely, hyperspheres and Grassmannians, and ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.
△ Less
Submitted 2 June, 2023; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Metric Entropy Limits on Recurrent Neural Network Learning of Linear Dynamical Systems
Authors:
Clemens Hutter,
Recep Gül,
Helmut Bölcskei
Abstract:
One of the most influential results in neural network theory is the universal approximation theorem [1, 2, 3] which states that continuous functions can be approximated to within arbitrary accuracy by single-hidden-layer feedforward neural networks. The purpose of this paper is to establish a result in this spirit for the approximation of general discrete-time linear dynamical systems - including…
▽ More
One of the most influential results in neural network theory is the universal approximation theorem [1, 2, 3] which states that continuous functions can be approximated to within arbitrary accuracy by single-hidden-layer feedforward neural networks. The purpose of this paper is to establish a result in this spirit for the approximation of general discrete-time linear dynamical systems - including time-varying systems - by recurrent neural networks (RNNs). For the subclass of linear time-invariant (LTI) systems, we devise a quantitative version of this statement. Specifically, measuring the complexity of the considered class of LTI systems through metric entropy according to [4], we show that RNNs can optimally learn - or identify in system-theory parlance - stable LTI systems. For LTI systems whose input-output relation is characterized through a difference equation, this means that RNNs can learn the difference equation from input-output traces in a metric-entropy optimal manner.
△ Less
Submitted 15 December, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Beurling-type density criteria for system identification
Authors:
V. Vlačić,
C. Aubel,
H. Bölcskei
Abstract:
This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delay-Doppler shifts without a lattice (or other geometry-discretizing) constraint on the support set. Concretely, we show that a class of such LTV systems is identifiable whenever the upper uniform Beurling density of the delay-Doppler support sets, measured un…
▽ More
This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delay-Doppler shifts without a lattice (or other geometry-discretizing) constraint on the support set. Concretely, we show that a class of such LTV systems is identifiable whenever the upper uniform Beurling density of the delay-Doppler support sets, measured uniformly over the class, is strictly less than 1/2. The proof of this result reveals an interesting relation between LTV system identification and interpolation in the Bargmann-Fock space. Moreover, we show that this density condition is also necessary for classes of systems invariant under time-frequency shifts and closed under a natural topology on the support sets. We furthermore show that identifiability guarantees robust recovery of the delay-Doppler support set, as well as the weights of the individual delay-Doppler shifts, both in the sense of asymptotically vanishing reconstruction error for vanishing measurement error.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Neural network identifiability for a family of sigmoidal nonlinearities
Authors:
Verner Vlačić,
Helmut Bölcskei
Abstract:
This paper addresses the following question of neural network identifiability: Does the input-output map realized by a feed-forward neural network with respect to a given nonlinearity uniquely specify the network architecture, weights, and biases? Existing literature on the subject Sussman 1992, Albertini, Sontag et al. 1993, Fefferman 1994 suggests that the answer should be yes, up to certain sym…
▽ More
This paper addresses the following question of neural network identifiability: Does the input-output map realized by a feed-forward neural network with respect to a given nonlinearity uniquely specify the network architecture, weights, and biases? Existing literature on the subject Sussman 1992, Albertini, Sontag et al. 1993, Fefferman 1994 suggests that the answer should be yes, up to certain symmetries induced by the nonlinearity, and provided the networks under consideration satisfy certain "genericity conditions". The results in Sussman 1992 and Albertini, Sontag et al. 1993 apply to networks with a single hidden layer and in Fefferman 1994 the networks need to be fully connected. In an effort to answer the identifiability question in greater generality, we derive necessary genericity conditions for the identifiability of neural networks of arbitrary depth and connectivity with an arbitrary nonlinearity. Moreover, we construct a family of nonlinearities for which these genericity conditions are minimal, i.e., both necessary and sufficient. This family is large enough to approximate many commonly encountered nonlinearities to within arbitrary precision in the uniform norm.
△ Less
Submitted 2 September, 2020; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Lossless Analog Compression
Authors:
Giovanni Alberti,
Helmut Bölcskei,
Camillo De Lellis,
Günther Koliander,
Erwin Riegler
Abstract:
We establish the fundamental limits of lossless analog compression by considering the recovery of arbitrary m-dimensional real random vectors x from the noiseless linear measurements y=Ax with n x m measurement matrix A. Our theory is inspired by the groundbreaking work of Wu and Verdu (2010) on almost lossless analog compression, but applies to the nonasymptotic, i.e., fixed-m case, and considers…
▽ More
We establish the fundamental limits of lossless analog compression by considering the recovery of arbitrary m-dimensional real random vectors x from the noiseless linear measurements y=Ax with n x m measurement matrix A. Our theory is inspired by the groundbreaking work of Wu and Verdu (2010) on almost lossless analog compression, but applies to the nonasymptotic, i.e., fixed-m case, and considers zero error probability. Specifically, our achievability result states that, for almost all A, the random vector x can be recovered with zero error probability provided that n > K(x), where K(x) is given by the infimum of the lower modified Minkowski dimension over all support sets U of x. We then particularize this achievability result to the class of s-rectifiable random vectors as introduced in Koliander et al. (2016); these are random vectors of absolutely continuous distribution---with respect to the s-dimensional Hausdorff measure---supported on countable unions of s-dimensional differentiable submanifolds of the m-dimensional real coordinate space. Countable unions of differentiable submanifolds include essentially all signal models used in the compressed sensing literature. Specifically, we prove that, for almost all A, s-rectifiable random vectors x can be recovered with zero error probability from n>s linear measurements. This threshold is, however, found not to be tight as exemplified by the construction of an s-rectifiable random vector that can be recovered with zero error probability from n<s linear measurements. This leads us to the introduction of the new class of s-analytic random vectors, which admit a strong converse in the sense of n greater than or equal to s being necessary for recovery with probability of error smaller than one. The central conceptual tools in the development of our theory are geometric measure theory and the theory of real analytic functions.
△ Less
Submitted 17 July, 2019; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Topology Reduction in Deep Convolutional Feature Extraction Networks
Authors:
Thomas Wiatowski,
Philipp Grohs,
Helmut Bölcskei
Abstract:
Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and $10$,$000$s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to application…
▽ More
Deep convolutional neural networks (CNNs) used in practice employ potentially hundreds of layers and $10$,$000$s of nodes. Such network sizes entail significant computational complexity due to the large number of convolutions that need to be carried out; in addition, a large number of parameters needs to be learned and stored. Very deep and wide CNNs may therefore not be well suited to applications operating under severe resource constraints as is the case, e.g., in low-power embedded and mobile platforms. This paper aims at understanding the impact of CNN topology, specifically depth and width, on the network's feature extraction capabilities. We address this question for the class of scattering networks that employ either Weyl-Heisenberg filters or wavelets, the modulus non-linearity, and no pooling. The exponential feature map energy decay results in Wiatowski et al., 2017, are generalized to $\mathcal{O}(a^{-N})$, where an arbitrary decay factor $a>1$ can be realized through suitable choice of the Weyl-Heisenberg prototype function or the mother wavelet. We then show how networks of fixed (possibly small) depth $N$ can be designed to guarantee that $((1-\varepsilon)\cdot 100)\%$ of the input signal's energy are contained in the feature vector. Based on the notion of operationally significant nodes, we characterize, partly rigorously and partly heuristically, the topology-reducing effects of (effectively) band-limited input signals, band-limited filters, and feature map symmetries. Finally, for networks based on Weyl-Heisenberg filters, we determine the prototype function bandwidth that minimizes---for fixed network depth $N$---the average number of operationally significant nodes per layer.
△ Less
Submitted 14 March, 2018; v1 submitted 10 July, 2017;
originally announced July 2017.
-
Optimal Approximation with Sparsely Connected Deep Neural Networks
Authors:
Helmut Bölcskei,
Philipp Grohs,
Gitta Kutyniok,
Philipp Petersen
Abstract:
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accurac…
▽ More
We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems---so-called \emph{affine systems}---can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $α$-shearlets, and more generally $α$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $α^{-1}$-cartoon-like functions, which is approximated optimally by $α$-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can actually learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on.
△ Less
Submitted 16 May, 2018; v1 submitted 4 May, 2017;
originally announced May 2017.
-
Energy Propagation in Deep Convolutional Neural Networks
Authors:
Thomas Wiatowski,
Philipp Grohs,
Helmut Bölcskei
Abstract:
Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the net…
▽ More
Many practical machine learning tasks employ very deep convolutional neural networks. Such large depths pose formidable computational challenges in training and operating the network. It is therefore important to understand how fast the energy contained in the propagated signals (a.k.a. feature maps) decays across layers. In addition, it is desirable that the feature extractor generated by the network be informative in the sense of the only signal map** to the all-zeros feature vector being the zero input signal. This "trivial null-set" property can be accomplished by asking for "energy conservation" in the sense of the energy in the feature vector being proportional to that of the corresponding input signal. This paper establishes conditions for energy conservation (and thus for a trivial null-set) for a wide class of deep convolutional neural network-based feature extractors and characterizes corresponding feature map energy decay rates. Specifically, we consider general scattering networks employing the modulus non-linearity and we find that under mild analyticity and high-pass conditions on the filters (which encompass, inter alia, various constructions of Weyl-Heisenberg filters, wavelets, ridgelets, ($α$)-curvelets, and shearlets) the feature map energy decays at least polynomially fast. For broad families of wavelets and Weyl-Heisenberg filters, the guaranteed decay rate is shown to be exponential. Moreover, we provide handy estimates of the number of layers needed to have at least $((1-\varepsilon)\cdot 100)\%$ of the input signal energy be contained in the feature vector.
△ Less
Submitted 1 February, 2018; v1 submitted 12 April, 2017;
originally announced April 2017.
-
Vandermonde Matrices with Nodes in the Unit Disk and the Large Sieve
Authors:
Céline Aubel,
Helmut Bölcskei
Abstract:
We derive bounds on the extremal singular values and the condition number of NxK, with N>=K, Vandermonde matrices with nodes in the unit disk. The mathematical techniques we develop to prove our main results are inspired by a link---first established by by Selberg [1] and later extended by Moitra [2]---between the extremal singular values of Vandermonde matrices with nodes on the unit circle and l…
▽ More
We derive bounds on the extremal singular values and the condition number of NxK, with N>=K, Vandermonde matrices with nodes in the unit disk. The mathematical techniques we develop to prove our main results are inspired by a link---first established by by Selberg [1] and later extended by Moitra [2]---between the extremal singular values of Vandermonde matrices with nodes on the unit circle and large sieve inequalities. Our main conceptual contribution lies in establishing a connection between the extremal singular values of Vandermonde matrices with nodes in the unit disk and a novel large sieve inequality involving polynomials in z \in C with |z|<=1. Compared to Bazán's upper bound on the condition number [3], which, to the best of our knowledge, constitutes the only analytical result---available in the literature---on the condition number of Vandermonde matrices with nodes in the unit disk, our bound not only takes a much simpler form, but is also sharper for certain node configurations. Moreover, the bound we obtain can be evaluated consistently in a numerically stable fashion, whereas the evaluation of Bazán's bound requires the solution of a linear system of equations which has the same condition number as the Vandermonde matrix under consideration and can therefore lead to numerical instability in practice. As a byproduct, our result---when particularized to the case of nodes on the unit circle---slightly improves upon the Selberg-Moitra bound.
△ Less
Submitted 3 August, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Deep Convolutional Neural Networks on Cartoon Functions
Authors:
Philipp Grohs,
Thomas Wiatowski,
Helmut Bölcskei
Abstract:
Wiatowski and Bölcskei, 2015, proved that deformation stability and vertical translation invariance of deep convolutional neural network-based feature extractors are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. While the translation invariance result applies to square-integrable functions, the deformation stability bound holds for ban…
▽ More
Wiatowski and Bölcskei, 2015, proved that deformation stability and vertical translation invariance of deep convolutional neural network-based feature extractors are guaranteed by the network structure per se rather than the specific convolution kernels and non-linearities. While the translation invariance result applies to square-integrable functions, the deformation stability bound holds for band-limited functions only. Many signals of practical relevance (such as natural images) exhibit, however, sharp and curved discontinuities and are, hence, not band-limited. The main contribution of this paper is a deformation stability result that takes these structural properties into account. Specifically, we establish deformation stability bounds for the class of cartoon functions introduced by Donoho, 2001.
△ Less
Submitted 12 February, 2018; v1 submitted 29 April, 2016;
originally announced May 2016.
-
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
Authors:
Thomas Wiatowski,
Helmut Bölcskei
Abstract:
Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical…
▽ More
Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical analysis of deep convolutional neural networks for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat's results by develo** a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl-Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions.
△ Less
Submitted 24 October, 2017; v1 submitted 19 December, 2015;
originally announced December 2015.
-
Deep Convolutional Neural Networks Based on Semi-Discrete Frames
Authors:
Thomas Wiatowski,
Helmut Bölcskei
Abstract:
Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resu…
▽ More
Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of these networks was pioneered by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on identical semi-discrete wavelet frames in each network layer, and proved translation-invariance as well as deformation stability of the resulting feature extractor. The purpose of this paper is to develop Mallat's theory further by allowing for different and, most importantly, general semi-discrete frames (such as, e.g., Gabor frames, wavelets, curvelets, shearlets, ridgelets) in distinct network layers. This allows to extract wider classes of features than point singularities resolved by the wavelet transform. Our generalized feature extractor is proven to be translation-invariant, and we develop deformation stability results for a larger class of deformations than those considered by Mallat. For Mallat's wavelet-based feature extractor, we get rid of a number of technical conditions. The mathematical engine behind our results is continuous frame theory, which allows us to completely detach the invariance and deformation stability proofs from the particular algebraic structure of the underlying frames.
△ Less
Submitted 21 April, 2015;
originally announced April 2015.
-
Density Criteria for the Identification of Linear Time-Varying Systems
Authors:
Céline Aubel,
Helmut Bölcskei
Abstract:
This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delays and Doppler shifts. We prove that stable identifiability is possible if the upper uniform Beurling density of the delay-Doppler support set is strictly smaller than 1/2 and stable identifiability is impossible for densities strictly larger than 1/2. The p…
▽ More
This paper addresses the problem of identifying a linear time-varying (LTV) system characterized by a (possibly infinite) discrete set of delays and Doppler shifts. We prove that stable identifiability is possible if the upper uniform Beurling density of the delay-Doppler support set is strictly smaller than 1/2 and stable identifiability is impossible for densities strictly larger than 1/2. The proof of this density theorem reveals an interesting relation between LTV system identification and interpolation in the Bargmann-Fock space. Finally, we introduce a subspace method for solving the system identification problem at hand.
△ Less
Submitted 20 April, 2015;
originally announced April 2015.
-
Noisy Subspace Clustering via Thresholding
Authors:
Reinhard Heckel,
Helmut Bölcskei
Abstract:
We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the…
▽ More
We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the subspaces intersect. Our results reveal an explicit tradeoff between the allowed noise level and the affinity of the subspaces. We furthermore find that the simple outlier detection scheme introduced in [1] provably succeeds in the noisy case.
△ Less
Submitted 18 July, 2013; v1 submitted 15 May, 2013;
originally announced May 2013.
-
Subspace Clustering via Thresholding and Spectral Clustering
Authors:
Reinhard Heckel,
Helmut Bölcskei
Abstract:
We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows…
▽ More
We consider the problem of clustering a set of high-dimensional data points into sets of low-dimensional linear subspaces. The number of subspaces, their dimensions, and their orientations are unknown. We propose a simple and low-complexity clustering algorithm based on thresholding the correlations between the data points followed by spectral clustering. A probabilistic performance analysis shows that this algorithm succeeds even when the subspaces intersect, and when the dimensions of the subspaces scale (up to a log-factor) linearly in the ambient dimension. Moreover, we prove that the algorithm also succeeds for data points that are subject to erasures with the number of erasures scaling (up to a log-factor) linearly in the ambient dimension. Finally, we propose a simple scheme that provably detects outliers.
△ Less
Submitted 15 March, 2013;
originally announced March 2013.
-
Noncoherent SIMO Pre-Log via Resolution of Singularities
Authors:
Erwin Riegler,
Veniamin I. Morgenshtern,
Giuseppe Durisi,
Shaowei Lin,
Bernd Sturmfels,
Helmut Bölcskei
Abstract:
We establish a lower bound on the noncoherent capacity pre-log of a temporally correlated Rayleigh block-fading single-input multiple-output (SIMO) channel. Our result holds for arbitrary rank Q of the channel correlation matrix, arbitrary block-length L > Q, and arbitrary number of receive antennas R, and includes the result in Morgenshtern et al. (2010) as a special case. It is well known that t…
▽ More
We establish a lower bound on the noncoherent capacity pre-log of a temporally correlated Rayleigh block-fading single-input multiple-output (SIMO) channel. Our result holds for arbitrary rank Q of the channel correlation matrix, arbitrary block-length L > Q, and arbitrary number of receive antennas R, and includes the result in Morgenshtern et al. (2010) as a special case. It is well known that the capacity pre-log for this channel in the single-input single-output (SISO) case is given by 1-Q/L, where Q/L is the penalty incurred by channel uncertainty. Our result reveals that this penalty can be reduced to 1/L by adding only one receive antenna, provided that L \geq 2Q - 1 and the channel correlation matrix satisfies mild technical conditions. The main technical tool used to prove our result is Hironaka's celebrated theorem on resolution of singularities in algebraic geometry.
△ Less
Submitted 30 May, 2011;
originally announced May 2011.
-
Tail Behavior of Sphere-Decoding Complexity in Random Lattices
Authors:
Dominik Seethaler,
Joakim Jaldén,
Christoph Studer,
Helmut Bölcskei
Abstract:
We analyze the (computational) complexity distribution of sphere-decoding (SD) for random infinite lattices. In particular, we show that under fairly general assumptions on the statistics of the lattice basis matrix, the tail behavior of the SD complexity distribution is solely determined by the inverse volume of a fundamental region of the underlying lattice. Particularizing this result to NxM,…
▽ More
We analyze the (computational) complexity distribution of sphere-decoding (SD) for random infinite lattices. In particular, we show that under fairly general assumptions on the statistics of the lattice basis matrix, the tail behavior of the SD complexity distribution is solely determined by the inverse volume of a fundamental region of the underlying lattice. Particularizing this result to NxM, N>=M, i.i.d. Gaussian lattice basis matrices, we find that the corresponding complexity distribution is of Pareto-type with tail exponent given by N-M+1. We furthermore show that this tail exponent is not improved by lattice-reduction, which includes layer-sorting as a special case.
△ Less
Submitted 8 May, 2009;
originally announced May 2009.
-
Geometrically Uniform Frames
Authors:
Yonina C. Eldar,
H. Bolcskei
Abstract:
We introduce a new class of frames with strong symmetry properties called geometrically uniform frames (GU), that are defined over an abelian group of unitary matrices and are generated by a single generating vector. The notion of GU frames is then extended to compound GU (CGU) frames which are generated by an abelian group of unitary matrices using multiple generating vectors.
The dual frame…
▽ More
We introduce a new class of frames with strong symmetry properties called geometrically uniform frames (GU), that are defined over an abelian group of unitary matrices and are generated by a single generating vector. The notion of GU frames is then extended to compound GU (CGU) frames which are generated by an abelian group of unitary matrices using multiple generating vectors.
The dual frame vectors and canonical tight frame vectors associated with GU frames are shown to be GU and therefore generated by a single generating vector, which can be computed very efficiently using a Fourier transform defined over the generating group of the frame. Similarly, the dual frame vectors and canonical tight frame vectors associated with CGU frames are shown to be CGU.
The impact of removing single or multiple elements from a GU frame is considered. A systematic method for constructing optimal GU frames from a given set of frame vectors that are not GU is also developed. Finally, the Euclidean distance properties of GU frames are discussed and conditions are derived on the abelian group of unitary matrices to yield GU frames with strictly positive distance spectrum irrespective of the generating vector.
△ Less
Submitted 13 August, 2001;
originally announced August 2001.