Search | arXiv e-print repository

Learning on manifolds without manifold learning

Abstract: Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. In contrast to the prevalent paradigm of solving this problem by minimizing a loss functional, we have given a direct one-shot construction together with optimal error bounds under the manifold assumption; i.e., one assumes that the data is sampled from an unknown sub-manif… ▽ More Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. In contrast to the prevalent paradigm of solving this problem by minimizing a loss functional, we have given a direct one-shot construction together with optimal error bounds under the manifold assumption; i.e., one assumes that the data is sampled from an unknown sub-manifold of a high dimensional Euclidean space. A great deal of research deals with obtaining information about this manifold, such as the eigendecomposition of the Laplace-Beltrami operator or coordinate charts, and using this information for function approximation. This two step approach implies some extra errors in the approximation stemming from basic quantities of the data in addition to the errors inherent in function approximation. In Neural Networks, 132:253268, 2020, we have proposed a one-shot direct method to achieve function approximation without requiring the extraction of any information about the manifold other than its dimension. However, one cannot pin down the class of approximants used in that paper. In this paper, we view the unknown manifold as a sub-manifold of an ambient hypersphere and study the question of constructing a one-shot approximation using the spherical polynomials based on the hypersphere. Our approach does not require preprocessing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively "rough" functions. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2302.00160 [pdf, ps, other]

Local transfer learning from one data space to another

Authors: H. N. Mhaskar, Ryan O'Dowd

Abstract: A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data sp… ▽ More A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related. △ Less

Submitted 7 July, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: To appear in Proceedings of ICAIPA 2022, Editors: S. Pereverzyev, R. Radha, S. Sivananthan, Springer Verlag

arXiv:2109.14752 [pdf, other]

Kernel distance measures for time series, random fields and other structured data

Authors: Srinjoy Das, Hrushikesh Mhaskar, Alexander Cloninger

Abstract: This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is construct… ▽ More This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy (MMD) distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2008.01245 [pdf, other]

Cautious Active Clustering

Authors: Alexander Cloninger, Hrushikesh Mhaskar

Abstract: We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditi… ▽ More We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class. Our technique involves the use of a highly localized kernel constructed from Hermite polynomials, in order to create a hierarchical estimate of the supports of the constituent probability measures. We do not need to make any assumptions on the nature of any of the probability measures nor know in advance the number of classes involved. We give theoretical guarantees measured by the $F$-score for our classification scheme. Examples include classification in hyper-spectral images and MNIST classification. △ Less

Submitted 7 December, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

arXiv:2003.13226 [pdf, ps, other]

Kernel based analysis of massive data

Authors: Hrushikesh N Mhaskar

Abstract: Dealing with massive data is a challenging task for machine learning. An important aspect of machine learning is function approximation. In the context of massive data, some of the commonly used tools for this purpose are sparsity, divide-and-conquer, and distributed learning. In this paper, we develop a very general theory of approximation by networks, which we have called eignets, to achieve loc… ▽ More Dealing with massive data is a challenging task for machine learning. An important aspect of machine learning is function approximation. In the context of massive data, some of the commonly used tools for this purpose are sparsity, divide-and-conquer, and distributed learning. In this paper, we develop a very general theory of approximation by networks, which we have called eignets, to achieve local, stratified approximation. The very massive nature of the data allows us to use these eignets to solve inverse problems such as finding a good approximation to the probability law that governs the data, and finding the local smoothness of the target function near different points in the domain. In fact, we develop a wavelet-like representation using our eignets. Our theory is applicable to approximation on a general locally compact metric measure space. Special examples include approximation by periodic basis functions on the torus, zonal function networks on a Euclidean sphere (including smooth ReLU networks), Gaussian networks, and approximation on manifolds. We construct pre-fabricated networks so that no data-based training is required for the approximation. △ Less

Submitted 7 July, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: Accepted for publication in Frontiers in Applied Mathematics and Statistics, section Mathematics of Computation and Data Science. Special issue on Fundamental Mathematical Topics in Data Science

arXiv:2001.12006 [pdf, other]

Theory inspired deep network for instantaneous-frequency extraction and signal components recovery from discrete blind-source data

Authors: Charles K. Chui, Ningning Han, Hrushikesh N. Mhaskar

Abstract: This paper is concerned with the inverse problem of recovering the unknown signal components, along with extraction of their instantaneous frequencies (IFs), governed by the adaptive harmonic model (AHM), from discrete (and possibly non-uniform) samples of the blind-source composite signal. None of the existing decomposition methods and algorithms, including the most popular empirical mode decom… ▽ More This paper is concerned with the inverse problem of recovering the unknown signal components, along with extraction of their instantaneous frequencies (IFs), governed by the adaptive harmonic model (AHM), from discrete (and possibly non-uniform) samples of the blind-source composite signal. None of the existing decomposition methods and algorithms, including the most popular empirical mode decomposition (EMD) computational scheme and its current modifications, is capable of solving this inverse problem. In order to meet the AHM formulation and to extract the IFs of the decomposed components, called intrinsic mode functions (IMFs), each IMF of EMD is extended to an analytic function in the upper half of the complex plane via the Hilbert transform, followed by taking the real part of the polar form of the analytic extension. Unfortunately, this approach most often fails to resolve the inverse problem satisfactorily. More recently, to resolve the inverse problem, the notion of synchrosqueezed wavelet transform (SST) was proposed by Daubechies and Maes, and further developed in many other papers, while a more direct method, called signal separation operation (SSO), was proposed and developed in our previous work published in the journal, Applied and Computational Harmonic Analysis, vol. 30(2):243-261, 2016. In the present paper, we propose a synthesis of SSO using a deep neural network, based directly on a discrete sample set, that may be non-uniformly sampled, of the blind-source signal. Our method is localized, as illustrated by a number of numerical examples, including components with different signal arrival and departure times. It also yields short-term prediction of the signal components, along with their IFs. Our neural networks are inspired by theory, designed so that they do not require any training in the traditional sense. △ Less

Submitted 31 January, 2020; originally announced January 2020.

arXiv:1908.09880 [pdf, ps, other]

Dimension independent bounds for general shallow networks

Authors: Hrushikesh N. Mhaskar

Abstract: This paper proves an abstract theorem addressing in a unified manner two important problems in function approximation: avoiding curse of dimensionality and estimating the degree of approximation for out-of-sample extension in manifold learning. We consider an abstract (shallow) network that includes, for example, neural networks, radial basis function networks, and kernels on data defined manifold… ▽ More This paper proves an abstract theorem addressing in a unified manner two important problems in function approximation: avoiding curse of dimensionality and estimating the degree of approximation for out-of-sample extension in manifold learning. We consider an abstract (shallow) network that includes, for example, neural networks, radial basis function networks, and kernels on data defined manifolds used for function approximation in various settings. A deep network is obtained by a composition of the shallow networks according to a directed acyclic graph, representing the architecture of the deep network. In this paper, we prove dimension independent bounds for approximation by shallow networks in the very general setting of what we have called $G$-networks on a compact metric measure space, where the notion of dimension is defined in terms of the cardinality of maximal distinguishable sets, generalizing the notion of dimension of a cube or a manifold. Our techniques give bounds that improve without saturation with the smoothness of the kernel involved in an integral representation of the target function. In the context of manifold learning, our bounds provide estimates on the degree of approximation for an out-of-sample extension of the target function to the ambient space. One consequence of our theorem is that without the requirement of robust parameter selection, deep networks using a non-smooth activation function such as the ReLU, do not provide any significant advantage over shallow networks in terms of the degree of approximation alone. △ Less

Submitted 4 November, 2019; v1 submitted 26 August, 2019; originally announced August 2019.

arXiv:1908.00156 [pdf, other]

A direct approach for function approximation on data defined manifolds

Authors: Hrushikesh Mhaskar

Abstract: In much of the literature on function approximation by deep networks, the function is assumed to be defined on some known domain, such as a cube or a sphere. In practice, the data might not be dense on these domains, and therefore, the approximation theory results are observed to be too conservative. In manifold learning, one assumes instead that the data is sampled from an unknown manifold; i.e.,… ▽ More In much of the literature on function approximation by deep networks, the function is assumed to be defined on some known domain, such as a cube or a sphere. In practice, the data might not be dense on these domains, and therefore, the approximation theory results are observed to be too conservative. In manifold learning, one assumes instead that the data is sampled from an unknown manifold; i.e., the manifold is defined by the data itself. Function approximation on this unknown manifold is then a two stage procedure: first, one approximates the Laplace-Beltrami operator (and its eigen-decomposition) on this manifold using a graph Laplacian, and next, approximates the target function using the eigen-functions. Alternatively, one estimates first some atlas on the manifold and then uses local approximation techniques based on the local coordinate charts. In this paper, we propose a more direct approach to function approximation on \emph{unknown}, data defined manifolds without computing the eigen-decomposition of some operator or an atlas for the manifold, and without any kind of training in the classical sense. Our constructions are universal; i.e., do not require the knowledge of any prior on the target function other than continuity on the manifold. We estimate the degree of approximation. For smooth functions, the estimates do not suffer from the so-called saturation phenomenon. We demonstrate via a property called good propagation of errors how the results can be lifted for function approximation using deep networks where each channel evaluates a Gaussian network on a possibly unknown manifold. △ Less

Submitted 20 August, 2020; v1 submitted 31 July, 2019; originally announced August 2019.

Comments: Version 1 was submitted on August 1, 2019 under the title Deep Gaussian networks for function approximation on data defined manifolds. This version is accepted for publication in Neural Networks

arXiv:1907.04895 [pdf, ps, other]

Super-resolution meets machine learning: approximation of measures

Authors: H. N. Mhaskar

Abstract: The problem of super-resolution in general terms is to recuperate a finitely supported measure $μ$ given finitely many of its coefficients $\hatμ(k)$ with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of $μ$. In… ▽ More The problem of super-resolution in general terms is to recuperate a finitely supported measure $μ$ given finitely many of its coefficients $\hatμ(k)$ with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of $μ$. In this paper, we consider the more severe problem of recuperating $μ$ approximately without any assumption on $μ$ beyond having a finite total variation. In particular, $μ$ may be supported on a continuum, so that the minimal separation among the points in the support of $μ$ is $0$. A variant of this problem is also of interest in machine learning as well as the inverse problem of de-convolution. We define an appropriate notion of a distance between the target measure and its recuperated version, give an explicit expression for the recuperation operator, and estimate the distance between $μ$ and its approximation. We show that these estimates are the best possible in many different ways. We also explain why for a finitely supported measure the approximation quality of its recuperation is bounded from below if the amount of information is smaller than what is demanded in the super-resolution problem. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: 14 pages, To appear in Journal of Fourier Analysis and Applications

arXiv:1905.12882 [pdf, other]

Function approximation by deep networks

Authors: H. N. Mhaskar, T. Poggio

Abstract: We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On th… ▽ More We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift' theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network. △ Less

Submitted 23 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: To appear in Communications in pure and applied mathematics

arXiv:1901.02975 [pdf, other]

A witness function based construction of discriminative models using Hermite polynomials

Authors: H. N. Mhaskar, A. Cloninger, X. Cheng

Abstract: In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $μ$; the marginal distribution for the $\mathbf{x}_j$'s being $μ^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments… ▽ More In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $μ$; the marginal distribution for the $\mathbf{x}_j$'s being $μ^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a `witness function' in classification problems. Thus, if the value of this estimator at a point $\mathbf{x}$ exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets. △ Less

Submitted 9 January, 2019; originally announced January 2019.

Comments: 20 pages, 3.1 MB

arXiv:1806.02003 [pdf, other]

Deep Algorithms: designs for networks

Authors: Abhejit Rajagopal, Shivkumar Chandrasekaran, Hrushikesh N. Mhaskar

Abstract: A new design methodology for neural networks that is guided by traditional algorithm design is presented. To prove our point, we present two heuristics and demonstrate an algorithmic technique for incorporating additional weights in their signal-flow graphs. We show that with training the performance of these networks can not only exceed the performance of the initial network, but can match the pe… ▽ More A new design methodology for neural networks that is guided by traditional algorithm design is presented. To prove our point, we present two heuristics and demonstrate an algorithmic technique for incorporating additional weights in their signal-flow graphs. We show that with training the performance of these networks can not only exceed the performance of the initial network, but can match the performance of more-traditional neural network architectures. A key feature of our approach is that these networks are initialized with parameters that provide a known performance threshold for the architecture on a given task. △ Less

Submitted 6 June, 2018; originally announced June 2018.

Comments: submitted to Thirty-second Annual Conference on Neural Information Processing Systems (NIPS), May 2018

arXiv:1707.09319 [pdf, ps, other]

A Fourier-invariant method for locating point-masses and computing their attributes

Authors: Charles K. Chui, Hrushikesh N. Mhaskar

Abstract: Motivated by the interest of observing the growth of cancer cells among normal living cells and exploring how galaxies and stars are truly formed, the objective of this paper is to introduce a rigorous and effective method for counting point-masses, determining their spatial locations, and computing their attributes. Based on computation of Hermite moments that are Fourier-invariant, our approach… ▽ More Motivated by the interest of observing the growth of cancer cells among normal living cells and exploring how galaxies and stars are truly formed, the objective of this paper is to introduce a rigorous and effective method for counting point-masses, determining their spatial locations, and computing their attributes. Based on computation of Hermite moments that are Fourier-invariant, our approach facilitates the processing of both spatial and Fourier data in any dimension. △ Less

Submitted 26 July, 2017; originally announced July 2017.

Showing 1–13 of 13 results for author: Mhaskar, H