Search | arXiv e-print repository

LINSCAN -- A Linearity Based Clustering Algorithm

Authors: Andrew Dennehy, Xiaoyu Zou, Shabnam J. Semnani, Yuri Fialko, Alexander Cloninger

Abstract: DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approx… ▽ More DBSCAN and OPTICS are powerful algorithms for identifying clusters of points in domains where few assumptions can be made about the structure of the data. In this paper, we leverage these strengths and introduce a new algorithm, LINSCAN, designed to seek lineated clusters that are difficult to find and isolate with existing methods. In particular, by embedding points as normal distributions approximating their local neighborhoods and leveraging a distance function derived from the Kullback Leibler Divergence, LINSCAN can detect and distinguish lineated clusters that are spatially close but have orthogonal covariances. We demonstrate how LINSCAN can be applied to seismic data to identify active faults, including intersecting faults, and determine their orientation. Finally, we discuss the properties a generalization of DBSCAN and OPTICS must have in order to retain the stability benefits of these algorithms. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2404.15261 [pdf, other]

All You Need is Resistance: On the Equivalence of Effective Resistance and Certain Optimal Transport Problems on Graphs

Authors: Sawyer Robertson, Zhengchao Wan, Alexander Cloninger

Abstract: The fields of effective resistance and optimal transport on graphs are filled with rich connections to combinatorics, geometry, machine learning, and beyond. In this article we put forth a bold claim: that the two fields should be understood as one and the same, up to a choice of $p$. We make this claim precise by introducing the parameterized family of $p$-Beckmann distances for probability measu… ▽ More The fields of effective resistance and optimal transport on graphs are filled with rich connections to combinatorics, geometry, machine learning, and beyond. In this article we put forth a bold claim: that the two fields should be understood as one and the same, up to a choice of $p$. We make this claim precise by introducing the parameterized family of $p$-Beckmann distances for probability measures on graphs and relate them sharply to certain Wasserstein distances. Then, we break open a suite of results including explicit connections to optimal stop** times and random walks on graphs, graph Sobolev spaces, and a Benamou-Brenier type formula for $2$-Beckmann distance. We further explore empirical implications in the world of unsupervised learning for graph data and propose further study of the usage of these metrics where Wasserstein distance may produce computational bottlenecks. △ Less

Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: 35 pages, 7 figures

MSC Class: 65K10; 05C21; 90C25; 68R10; 05C50

arXiv:2403.02372 [pdf, other]

OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport

Authors: Alireza Pirhadi, Mohammad Hossein Moslemi, Alexander Cloninger, Mostafa Milani, Babak Salimi

Abstract: Ensuring Conditional Independence (CI) constraints is pivotal for the development of fair and trustworthy machine learning models. In this paper, we introduce \sys, a framework that harnesses optimal transport theory for data repair under CI constraints. Optimal transport theory provides a rigorous framework for measuring the discrepancy between probability distributions, thereby ensuring control… ▽ More Ensuring Conditional Independence (CI) constraints is pivotal for the development of fair and trustworthy machine learning models. In this paper, we introduce \sys, a framework that harnesses optimal transport theory for data repair under CI constraints. Optimal transport theory provides a rigorous framework for measuring the discrepancy between probability distributions, thereby ensuring control over data utility. We formulate the data repair problem concerning CIs as a Quadratically Constrained Linear Program (QCLP) and propose an alternating method for its solution. However, this approach faces scalability issues due to the computational cost associated with computing optimal transport distances, such as the Wasserstein distance. To overcome these scalability challenges, we reframe our problem as a regularized optimization problem, enabling us to develop an iterative algorithm inspired by Sinkhorn's matrix scaling algorithm, which efficiently addresses high-dimensional and large-scale data. Through extensive experiments, we demonstrate the efficacy and efficiency of our proposed methods, showcasing their practical utility in real-world data cleaning and preprocessing tasks. Furthermore, we provide comparisons with traditional approaches, highlighting the superiority of our techniques in terms of preserving data utility while ensuring adherence to the desired CI constraints. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.01460 [pdf, other]

Point Cloud Classification via Deep Set Linearized Optimal Transport

Authors: Scott Mahan, Caroline Moosmüller, Alexander Cloninger

Abstract: We introduce Deep Set Linearized Optimal Transport, an algorithm designed for the efficient simultaneous embedding of point clouds into an $L^2-$space. This embedding preserves specific low-dimensional structures within the Wasserstein space while constructing a classifier to distinguish between various classes of point clouds. Our approach is motivated by the observation that $L^2-$distances betw… ▽ More We introduce Deep Set Linearized Optimal Transport, an algorithm designed for the efficient simultaneous embedding of point clouds into an $L^2-$space. This embedding preserves specific low-dimensional structures within the Wasserstein space while constructing a classifier to distinguish between various classes of point clouds. Our approach is motivated by the observation that $L^2-$distances between optimal transport maps for distinct point clouds, originating from a shared fixed reference distribution, provide an approximation of the Wasserstein-2 distance between these point clouds, under certain assumptions. To learn approximations of these transport maps, we employ input convex neural networks (ICNNs) and establish that, under specific conditions, Euclidean distances between samples from these ICNNs closely mirror Wasserstein-2 distances between the true distributions. Additionally, we train a discriminator network that attaches weights these samples and creates a permutation invariant classifier to differentiate between different classes of point clouds. We showcase the advantages of our algorithm over the standard deep set approach through experiments on a flow cytometry dataset with a limited number of labeled point clouds. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.10295 [pdf, other]

On a Generalization of Wasserstein Distance and the Beckmann Problem to Connection Graphs

Authors: Sawyer Robertson, Dhruv Kohli, Gal Mishne, Alexander Cloninger

Abstract: The intersection of connection graphs and discrete optimal transport presents a novel paradigm for understanding complex graphs and node interactions. In this paper, we delve into this unexplored territory by focusing on the Beckmann problem within the context of connection graphs. Our study establishes feasibility conditions for the resulting convex optimization problem on connection graphs. Furt… ▽ More The intersection of connection graphs and discrete optimal transport presents a novel paradigm for understanding complex graphs and node interactions. In this paper, we delve into this unexplored territory by focusing on the Beckmann problem within the context of connection graphs. Our study establishes feasibility conditions for the resulting convex optimization problem on connection graphs. Furthermore, we establish strong duality for the conventional Beckmann problem, and extend our analysis to encompass strong duality and duality correspondence for a quadratically regularized variant. To put our findings into practice, we implement the regularized problem using gradient descent, enabling a practical approach to solving this complex problem. We showcase optimal flows and solutions, providing valuable insights into the real-world implications of our theoretical framework. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 19 pages, 6 figures

MSC Class: 65K10; 05C21; 90C25; 68R10; 05C50

arXiv:2308.09690 [pdf, other]

Random Walks, Conductance, and Resistance for the Connection Graph Laplacian

Authors: Alexander Cloninger, Gal Mishne, Andreas Oslandsbotn, Sawyer Jack Robertson, Zhengchao Wan, Yusu Wang

Abstract: We investigate the concept of effective resistance in connection graphs, expanding its traditional application from undirected graphs. We propose a robust definition of effective resistance in connection graphs by focusing on the duality of Dirichlet-type and Poisson-type problems on connection graphs. Additionally, we delve into random walks, taking into account both node transitions and vector r… ▽ More We investigate the concept of effective resistance in connection graphs, expanding its traditional application from undirected graphs. We propose a robust definition of effective resistance in connection graphs by focusing on the duality of Dirichlet-type and Poisson-type problems on connection graphs. Additionally, we delve into random walks, taking into account both node transitions and vector rotations. This approach introduces novel concepts of effective conductance and resistance matrices for connection graphs, capturing mean rotation matrices corresponding to random walk transitions. Thereby, it provides new theoretical insights for network analysis and optimization. △ Less

Submitted 20 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.00142 [pdf, other]

Semi-Supervised Laplacian Learning on Stiefel Manifolds

Authors: Chester Holtz, Pengwen Chen, Alexander Cloninger, Chung-Kuan Cheng, Gal Mishne

Abstract: Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first sh… ▽ More Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first show that a first-order condition implies the solution of a manifold alignment problem and that solutions to the classical \emph{Orthogonal Procrustes} problem can be used to efficiently find good classifiers that are amenable to further refinement. Next, we address the criticality of selecting supervised samples at low-label rates. We characterize informative samples with a novel measure of centrality derived from the principal eigenvectors of a certain submatrix of the graph Laplacian. We demonstrate that our framework achieves lower classification error compared to recent state-of-the-art and classical semi-supervised learning methods at extremely low, medium, and high label rates. Our code is available on github\footnote{anonymized for submission}. △ Less

Submitted 31 July, 2023; originally announced August 2023.

arXiv:2306.15649 [pdf, ps, other]

Effective resistance in metric spaces

Authors: Robi Bhattacharjee, Alexander Cloninger, Yoav Freund, Andreas Oslandsbotn

Abstract: Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigenvectors of the graph Laplacian. One attractive application of ER is to point clouds, i.e. graphs whose vertices correspond to IID samples from a distribution over a metric space. Unfortunately, it was shown that the ER between any two points converges to a trivial qua… ▽ More Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigenvectors of the graph Laplacian. One attractive application of ER is to point clouds, i.e. graphs whose vertices correspond to IID samples from a distribution over a metric space. Unfortunately, it was shown that the ER between any two points converges to a trivial quantity that holds no information about the graph's structure as the size of the sample increases to infinity. In this study, we show that this trivial solution can be circumvented by considering a region-based ER between pairs of small regions rather than pairs of points and by scaling the edge weights appropriately with respect to the underlying density in each region. By kee** the regions fixed, we show analytically that the region-based ER converges to a non-trivial limit as the number of points increases to infinity. Namely the ER on a metric space. We support our theoretical findings with numerical experiments. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2302.07373 [pdf, other]

Linearized Wasserstein dimensionality reduction with approximation guarantees

Authors: Alexander Cloninger, Keaton Hamm, Varun Khurana, Caroline Moosmüller

Abstract: We introduce LOT Wassmap, a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. The algorithm is motivated by the observation that many datasets are naturally interpreted as probability measures rather than points in $\mathbb{R}^n$, and that finding low-dimensional descriptions of such datasets requires manifold learning algorithms in the Wasserstein… ▽ More We introduce LOT Wassmap, a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. The algorithm is motivated by the observation that many datasets are naturally interpreted as probability measures rather than points in $\mathbb{R}^n$, and that finding low-dimensional descriptions of such datasets requires manifold learning algorithms in the Wasserstein space. Most available algorithms are based on computing the pairwise Wasserstein distance matrix, which can be computationally challenging for large datasets in high dimensions. Our algorithm leverages approximation schemes such as Sinkhorn distances and linearized optimal transport to speed-up computations, and in particular, avoids computing a pairwise distance matrix. We provide guarantees on the embedding quality under such approximations, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. Experiments demonstrate that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 38 pages, 10 figures. Submitted

arXiv:2210.01760 [pdf, other]

Evaluating Disentanglement in Generative Models Without Knowledge of Latent Factors

Authors: Chester Holtz, Gal Mishne, Alexander Cloninger

Abstract: Probabilistic generative models provide a flexible and systematic framework for learning the underlying geometry of data. However, model selection in this setting is challenging, particularly when selecting for ill-defined qualities such as disentanglement or interpretability. In this work, we address this gap by introducing a method for ranking generative models based on the training dynamics exh… ▽ More Probabilistic generative models provide a flexible and systematic framework for learning the underlying geometry of data. However, model selection in this setting is challenging, particularly when selecting for ill-defined qualities such as disentanglement or interpretability. In this work, we address this gap by introducing a method for ranking generative models based on the training dynamics exhibited during learning. Inspired by recent theoretical characterizations of disentanglement, our method does not require supervision of the underlying latent factors. We evaluate our approach by demonstrating the need for disentanglement metrics which do not require labels\textemdash the underlying generative factors. We additionally demonstrate that our approach correlates with baseline supervised methods for evaluating disentanglement. Finally, we show that our method can be used as an unsupervised indicator for downstream performance on reinforcement learning and fairness-classification problems. △ Less

Submitted 4 October, 2022; originally announced October 2022.

arXiv:2208.10570 [pdf, other]

Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders

Authors: Stefan C. Schonsheck, Scott Mahan, Timo Klock, Alexander Cloninger, Rongjie Lai

Abstract: Autoencoding is a popular method in representation learning. Conventional autoencoders employ symmetric encoding-decoding procedures and a simple Euclidean latent space to detect hidden low-dimensional structures in an unsupervised way. This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class lab… ▽ More Autoencoding is a popular method in representation learning. Conventional autoencoders employ symmetric encoding-decoding procedures and a simple Euclidean latent space to detect hidden low-dimensional structures in an unsupervised way. This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels. Besides enhancing the capability for handling data with complicated topological and geometric structures, these models can successfully differentiate nearby but disjoint manifolds and intersecting manifolds with only a small amount of supervision. Moreover, this model only requires a low complexity encoder, such as local linear projection. We discuss the theoretical approximation power of such networks that essentially depends on the intrinsic dimension of the data manifold and not the dimension of the observations. Our numerical experiments on synthetic and real-world data verify that the proposed model can effectively manage data with multi-class nearby but disjoint manifolds of different classes, overlap** manifolds, and manifolds with non-trivial topology. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2203.00063 [pdf, other]

Structure from Voltage

Authors: Robi Bhattacharjee, Alex Cloninger, Yoav Freund, Andreas Oslandsbotn

Abstract: Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond… ▽ More Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points. △ Less

Submitted 28 June, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

arXiv:2109.14752 [pdf, other]

Kernel distance measures for time series, random fields and other structured data

Authors: Srinjoy Das, Hrushikesh Mhaskar, Alexander Cloninger

Abstract: This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is construct… ▽ More This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy (MMD) distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2108.10411 [pdf, other]

StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Authors: Andreas Oslandsbotn, Zeljko Kereta, Valeriya Naumova, Yoav Freund, Alexander Cloninger

Abstract: Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, existing implementations of KRR require that all the data is stored in the main memory, which severely limits the use of KRR in contexts where data size far exceeds the memory size. Such applications are increasingly common in data mining, bioinformatics, and control. A powerful paradigm for computin… ▽ More Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, existing implementations of KRR require that all the data is stored in the main memory, which severely limits the use of KRR in contexts where data size far exceeds the memory size. Such applications are increasingly common in data mining, bioinformatics, and control. A powerful paradigm for computing on data sets that are too large for memory is the streaming model of computation, where we process one data sample at a time, discarding each sample before moving on to the next one. In this paper, we propose StreaMRAK - a streaming version of KRR. StreaMRAK improves on existing KRR schemes by dividing the problem into several levels of resolution, which allows continual refinement to the predictions. The algorithm reduces the memory requirement by continuously and efficiently integrating new samples into the training model. With a novel sub-sampling scheme, StreaMRAK reduces memory and computational complexities by creating a sketch of the original data, where the sub-sampling density is adapted to the bandwidth of the kernel and the local dimensionality of the data. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum. The results show that the proposed algorithm is fast and accurate. △ Less

Submitted 7 September, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

MSC Class: 68Q32; 65D15; 46E22; 68W27

arXiv:2106.02614 [pdf, other]

Sigma-Delta and Distributed Noise-Sha** Quantization Methods for Random Fourier Features

Authors: **jie Zhang, Harish Kannan, Alexander Cloninger, Rayan Saab

Abstract: We propose the use of low bit-depth Sigma-Delta and distributed noise-sha** methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension o… ▽ More We propose the use of low bit-depth Sigma-Delta and distributed noise-sha** methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context. △ Less

Submitted 12 April, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2101.11055 [pdf, other]

LDLE: Low Distortion Local Eigenmaps

Authors: Dhruv Kohli, Alexander Cloninger, Gal Mishne

Abstract: We present Low Distortion Local Eigenmaps (LDLE), a manifold learning technique which constructs a set of low distortion local views of a dataset in lower dimension and registers them to obtain a global embedding. The local views are constructed using the global eigenvectors of the graph Laplacian and are registered using Procrustes analysis. The choice of these eigenvectors may vary across the re… ▽ More We present Low Distortion Local Eigenmaps (LDLE), a manifold learning technique which constructs a set of low distortion local views of a dataset in lower dimension and registers them to obtain a global embedding. The local views are constructed using the global eigenvectors of the graph Laplacian and are registered using Procrustes analysis. The choice of these eigenvectors may vary across the regions. In contrast to existing techniques, LDLE can embed closed and non-orientable manifolds into their intrinsic dimension by tearing them apart. It also provides gluing instruction on the boundary of the torn embedding to help identify the topology of the original manifold. Our experimental results will show that LDLE largely preserved distances up to a constant scale while other techniques produced higher distortion. We also demonstrate that LDLE produces high quality embeddings even when the data is noisy or sparse. △ Less

Submitted 14 October, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

Comments: 66 pages, 32 figures, preprint. Rearranged figures

Journal ref: JMLR (2021) 22: 64

arXiv:2010.04227 [pdf, other]

A low discrepancy sequence on graphs

Authors: A. Cloninger, H. N. Mhaskar

Abstract: Many applications such as election forecasting, environmental monitoring, health policy, and graph based machine learning require taking expectation of functions defined on the vertices of a graph. We describe a construction of a sampling scheme analogous to the so called Leja points in complex potential theory that can be proved to give low discrepancy estimates for the approximation of the expec… ▽ More Many applications such as election forecasting, environmental monitoring, health policy, and graph based machine learning require taking expectation of functions defined on the vertices of a graph. We describe a construction of a sampling scheme analogous to the so called Leja points in complex potential theory that can be proved to give low discrepancy estimates for the approximation of the expected value by the impirical expected value based on these points. In contrast to classical potential theory where the kernel is fixed and the equilibrium distribution depends upon the kernel, we fix a probability distribution and construct a kernel (which represents the graph structure) for which the equilibrium distribution is the given probability distribution. Our estimates do not depend upon the size of the graph. △ Less

Submitted 7 June, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: Accepted for publication in Journal of Fourier Analysis and Applications

arXiv:2008.09165 [pdf, other]

Linear Optimal Transport Embedding: Provable Wasserstein classification for certain rigid transformations and perturbations

Authors: Caroline Moosmüller, Alexander Cloninger

Abstract: Discriminating between distributions is an important problem in a number of scientific fields. This motivated the introduction of Linear Optimal Transportation (LOT), which embeds the space of distributions into an $L^2$-space. The transform is defined by computing the optimal transport of each distribution to a fixed reference distribution, and has a number of benefits when it comes to speed of c… ▽ More Discriminating between distributions is an important problem in a number of scientific fields. This motivated the introduction of Linear Optimal Transportation (LOT), which embeds the space of distributions into an $L^2$-space. The transform is defined by computing the optimal transport of each distribution to a fixed reference distribution, and has a number of benefits when it comes to speed of computation and to determining classification boundaries. In this paper, we characterize a number of settings in which LOT embeds families of distributions into a space in which they are linearly separable. This is true in arbitrary dimension, and for families of distributions generated through perturbations of shifts and scalings of a fixed distribution.We also prove conditions under which the $L^2$ distance of the LOT embedding between two distributions in arbitrary dimension is nearly isometric to Wasserstein-2 distance between those distributions. This is of significant computational benefit, as one must only compute $N$ optimal transport maps to define the $N^2$ pairwise distances between $N$ distributions. We demonstrate the benefits of LOT on a number of distribution classification problems. △ Less

Submitted 25 May, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: 28 pages, 2 figures

MSC Class: 60D05; 68T10; 68T05

arXiv:2008.02545 [pdf, other]

A deep network construction that adapts to intrinsic dimensionality beyond the domain

Authors: Alexander Cloninger, Timo Klock

Abstract: We study the approximation of two-layer compositions $f(x) = g(φ(x))$ via deep networks with ReLU activation, where $φ$ is a geometrically intuitive, dimensionality reducing feature map. We focus on two intuitive and practically relevant choices for $φ$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets. We achieve near optimal approx… ▽ More We study the approximation of two-layer compositions $f(x) = g(φ(x))$ via deep networks with ReLU activation, where $φ$ is a geometrically intuitive, dimensionality reducing feature map. We focus on two intuitive and practically relevant choices for $φ$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets. We achieve near optimal approximation rates, which depend only on the complexity of the dimensionality reducing map $φ$ rather than the ambient dimension. Since $φ$ encapsulates all nonlinear features that are material to the function $f$, this suggests that deep nets are faithful to an intrinsic dimension governed by $f$ rather than the complexity of the domain of $f$. In particular, the prevalent assumption of approximating functions on low-dimensional manifolds can be significantly relaxed using functions of type $f(x) = g(φ(x))$ with $φ$ representing an orthogonal projection onto the same manifold. △ Less

Submitted 26 April, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

MSC Class: 41A25; 41A46; 41A63; 62G05

arXiv:2008.01245 [pdf, other]

Cautious Active Clustering

Authors: Alexander Cloninger, Hrushikesh Mhaskar

Abstract: We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditi… ▽ More We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class. Our technique involves the use of a highly localized kernel constructed from Hermite polynomials, in order to create a hierarchical estimate of the supports of the constituent probability measures. We do not need to make any assumptions on the nature of any of the probability measures nor know in advance the number of classes involved. We give theoretical guarantees measured by the $F$-score for our classification scheme. Examples include classification in hyper-spectral images and MNIST classification. △ Less

Submitted 7 December, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

arXiv:2007.11730 [pdf, other]

doi 10.1016/j.neunet.2021.01.007

Nonclosedness of Sets of Neural Networks in Sobolev Spaces

Authors: Scott Mahan, Emily King, Alex Cloninger

Abstract: We examine the closedness of sets of realized neural networks of a fixed architecture in Sobolev spaces. For an exactly $m$-times differentiable activation function $ρ$, we construct a sequence of neural networks $(Φ_n)_{n \in \mathbb{N}}$ whose realizations converge in order-$(m-1)$ Sobolev norm to a function that cannot be realized exactly by a neural network. Thus, sets of realized neural netwo… ▽ More We examine the closedness of sets of realized neural networks of a fixed architecture in Sobolev spaces. For an exactly $m$-times differentiable activation function $ρ$, we construct a sequence of neural networks $(Φ_n)_{n \in \mathbb{N}}$ whose realizations converge in order-$(m-1)$ Sobolev norm to a function that cannot be realized exactly by a neural network. Thus, sets of realized neural networks are not closed in order-$(m-1)$ Sobolev spaces $W^{m-1,p}$ for $p \in [1,\infty]$. We further show that these sets are not closed in $W^{m,p}$ under slightly stronger conditions on the $m$-th derivative of $ρ$. For a real analytic activation function, we show that sets of realized neural networks are not closed in $W^{k,p}$ for any $k \in \mathbb{N}$. The nonclosedness allows for approximation of non-network target functions with unbounded parameter growth. We partially characterize the rate of parameter growth for most activation functions by showing that a specific sequence of realized neural networks can approximate the activation function's derivative with weights increasing inversely proportional to the $L^p$ approximation error. Finally, we present experimental results showing that networks are capable of closely approximating non-network target functions with increasing parameters via training. △ Less

Submitted 27 January, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

arXiv:1910.12454 [pdf, other]

PT-MMD: A Novel Statistical Framework for the Evaluation of Generative Systems

Authors: Alexander Potapov, Ian Colbert, Ken Kreutz-Delgado, Alexander Cloninger, Srinjoy Das

Abstract: Stochastic-sampling-based Generative Neural Networks, such as Restricted Boltzmann Machines and Generative Adversarial Networks, are now used for applications such as denoising, image occlusion removal, pattern completion, and motion synthesis. In scenarios which involve performing such inference tasks with these models, it is critical to determine metrics that allow for model selection and/or mai… ▽ More Stochastic-sampling-based Generative Neural Networks, such as Restricted Boltzmann Machines and Generative Adversarial Networks, are now used for applications such as denoising, image occlusion removal, pattern completion, and motion synthesis. In scenarios which involve performing such inference tasks with these models, it is critical to determine metrics that allow for model selection and/or maintenance of requisite generative performance under pre-specified implementation constraints. In this paper, we propose a new metric for evaluating generative model performance based on $p$-values derived from the combined use of Maximum Mean Discrepancy (MMD) and permutation-based (PT-based) resampling, which we refer to as PT-MMD. We demonstrate the effectiveness of this metric for two cases: (1) Selection of bitwidth and activation function complexity to achieve minimum power-at-performance for Restricted Boltzmann Machines; (2) Quantitative comparison of images generated by two types of Generative Adversarial Networks (PGAN and WGAN) to facilitate model selection in order to maximize the fidelity of generated images. For these applications, our results are shown using Euclidean and Haar-based kernels for the PT-MMD two sample hypothesis test. This demonstrates the critical role of distance functions in comparing generated images against their corresponding ground truth counterparts as what would be perceived by human users. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: Will be presented at the Asilomar Conference on Signals, Systems, and Computers

arXiv:1909.11298 [pdf, other]

doi 10.1109/TIT.2022.3175691

Classification Logit Two-sample Testing by Neural Networks

Authors: Xiuyuan Cheng, Alexander Cloninger

Abstract: The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that the algorithm scales to large samples. This paper proposes a two-sample statistic which is the difference of the logit function, provided by a trained classificat… ▽ More The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that the algorithm scales to large samples. This paper proposes a two-sample statistic which is the difference of the logit function, provided by a trained classification neural network, evaluated on the testing set split of the two datasets. Theoretically, we prove the testing power to differentiate two sub-exponential densities given that the network is sufficiently parametrized. When the two densities lie on or near to low-dimensional manifolds embedded in possibly high-dimensional space, the needed network complexity is reduced to only scale with the intrinsic dimensionality. Both the approximation and estimation error analysis are based on a new result of near-manifold integral approximation. In experiments, the proposed method demonstrates better performance than previous network-based tests using classification accuracy as the two-sample statistic, and compares favorably to certain kernel maximum mean discrepancy tests on synthetic datasets and hand-written digit datasets. △ Less

Submitted 22 May, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

arXiv:1906.06033 [pdf, other]

Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks

Authors: Ahmed T. Elthakeb, Prannoy Pilligundla, Alex Cloninger, Hadi Esmaeilzadeh

Abstract: The deep layers of modern neural networks extract a rather rich set of features as an input propagates through the network. This paper sets out to harvest these rich intermediate representations for quantization with minimal accuracy loss while significantly reducing the memory footprint and compute intensity of the DNN. This paper utilizes knowledge distillation through teacher-student paradigm (… ▽ More The deep layers of modern neural networks extract a rather rich set of features as an input propagates through the network. This paper sets out to harvest these rich intermediate representations for quantization with minimal accuracy loss while significantly reducing the memory footprint and compute intensity of the DNN. This paper utilizes knowledge distillation through teacher-student paradigm (Hinton et al., 2015) in a novel setting that exploits the feature extraction capability of DNNs for higher-accuracy quantization. As such, our algorithm logically divides a pretrained full-precision DNN to multiple sections, each of which exposes intermediate features to train a team of students independently in the quantized domain. This divide and conquer strategy, in fact, makes the training of each student section possible in isolation while all these independently trained sections are later stitched together to form the equivalent fully quantized network. Our algorithm is a sectional approach towards knowledge distillation and is not treating the intermediate representation as a hint for pretraining before one knowledge distillation pass over the entire network (Romero et al., 2015). Experiments on various DNNs (AlexNet, LeNet, MobileNet, ResNet-18, ResNet-20, SVHN and VGG-11) show that, this approach -- called DCQ (Divide and Conquer Quantization) -- on average, improves the performance of a state-of-the-art quantized training technique, DoReFa-Net (Zhou et al., 2016) by 21.6% and 9.3% for binary and ternary quantization, respectively. Additionally, we show that incorporating DCQ to existing quantized training methods leads to improved accuracies as compared to previously reported by multiple state-of-the-art quantized training methods. △ Less

Submitted 2 March, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1906.01021 [pdf, other]

Coresets for Estimating Means and Mean Square Error with Limited Greedy Samples

Authors: Saeed Vahidian, Baharan Mirzasoleiman, Alexander Cloninger

Abstract: In a number of situations, collecting a function value for every data point may be prohibitively expensive, and random sampling ignores any structure in the underlying data. We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank-Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of ve… ▽ More In a number of situations, collecting a function value for every data point may be prohibitively expensive, and random sampling ignores any structure in the underlying data. We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank-Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of vertices that are deemed most important to sample. Our algorithm estimates the mean of the function by taking a weighted sum only at these vertices, and we provably bound the estimation error in terms of the location and weights of the selected vertices in the graph. In addition, we consider the case where nodes have different selection costs and provide bounds on the quality of the low-cost selected coresets. We demonstrate the benefits of our algorithm on the semi-supervised node classification of graph convolutional neural network, point clouds and structured graphs, as well as sensor placement where the cost of placing sensors depends on the location of the placement. We also elucidate that the empirical convergence of our proposed method is faster than random selection and various clustering methods while still respecting sensor placement cost. The paper concludes with validation of the developed algorithm on both synthetic and real datasets, demonstrating that it outperforms the current state of the art. △ Less

Submitted 22 June, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

arXiv:1905.12724 [pdf, other]

Variational Diffusion Autoencoders with Random Walk Sampling

Authors: Henry Li, Ofir Lindenbaum, Xiuyuan Cheng, Alexander Cloninger

Abstract: Variational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the sampling space. This is a construction that strives to define the data manifold. A major obstacle to VAEs and GANs, however, is choosing a suitable prior tha… ▽ More Variational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the sampling space. This is a construction that strives to define the data manifold. A major obstacle to VAEs and GANs, however, is choosing a suitable prior that matches the data topology. Well-known consequences of poorly picked priors are posterior and mode collapse. To our knowledge, no existing method sidesteps this user choice. Conversely, $\textit{diffusion maps}$ automatically infer the data topology and enjoy a rigorous connection to manifold learning, but do not scale easily or provide the inverse homeomorphism (i.e. decoder/generator). We propose a method that combines these approaches into a generative model that inherits the asymptotic guarantees of $\textit{diffusion maps}$ while preserving the scalability of deep models. We prove approximation theoretic results for the dimension dependence of our proposed method. Finally, we demonstrate the effectiveness of our method with various real and synthetic datasets. △ Less

Submitted 27 August, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: 24 pages, 9 figures, 1 table; accepted to ECCV 2020

arXiv:1901.02975 [pdf, other]

A witness function based construction of discriminative models using Hermite polynomials

Authors: H. N. Mhaskar, A. Cloninger, X. Cheng

Abstract: In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $μ$; the marginal distribution for the $\mathbf{x}_j$'s being $μ^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments… ▽ More In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $μ$; the marginal distribution for the $\mathbf{x}_j$'s being $μ^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a `witness function' in classification problems. Thus, if the value of this estimator at a point $\mathbf{x}$ exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets. △ Less

Submitted 9 January, 2019; originally announced January 2019.

Comments: 20 pages, 3.1 MB

arXiv:1812.04594 [pdf, other]

Bounding the Error From Reference Set Kernel Maximum Mean Discrepancy

Authors: Alexander Cloninger

Abstract: In this paper, we bound the error induced by using a weighted skeletonization of two data sets for computing a two sample test with kernel maximum mean discrepancy. The error is quantified in terms of the speed in which heat diffuses from those points to the rest of the data, as well as how at the weights on the reference points are, and gives a non-asymptotic, non-probabilistic bound. The result… ▽ More In this paper, we bound the error induced by using a weighted skeletonization of two data sets for computing a two sample test with kernel maximum mean discrepancy. The error is quantified in terms of the speed in which heat diffuses from those points to the rest of the data, as well as how at the weights on the reference points are, and gives a non-asymptotic, non-probabilistic bound. The result ties into the problem of the eigenvector triple product, which appears in a number of important problems. The error bound also suggests an optimization scheme for choosing the best set of reference points and weights. The method is tested on a several two sample test examples. △ Less

Submitted 11 December, 2018; originally announced December 2018.

arXiv:1804.09816 [pdf, other]

On the Dual Geometry of Laplacian Eigenfunctions

Authors: Alexander Cloninger, Stefan Steinerberger

Abstract: We discuss the geometry of Laplacian eigenfunctions $-Δφ= λφ$ on compact manifolds $(M,g)$ and combinatorial graphs $G=(V,E)$. The 'dual' geometry of Laplacian eigenfunctions is well understood on $\mathbb{T}^d$ (identified with $\mathbb{Z}^d$) and $\mathbb{R}^n$ (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our pape… ▽ More We discuss the geometry of Laplacian eigenfunctions $-Δφ= λφ$ on compact manifolds $(M,g)$ and combinatorial graphs $G=(V,E)$. The 'dual' geometry of Laplacian eigenfunctions is well understood on $\mathbb{T}^d$ (identified with $\mathbb{Z}^d$) and $\mathbb{R}^n$ (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our paper is to point out a notion of similarity between eigenfunctions that allows to reconstruct that geometry. Our measure of 'similarity' $ α(φ_λ, φ_μ)$ between eigenfunctions $φ_λ$ and $φ_μ$ is given by a global average of local correlations $$ α(φ_λ, φ_μ)^2 = \| φ_λ φ_μ \|_{L^2}^{-2}\int_{M}{ \left( \int_{M}{ p(t,x,y)( φ_λ(y) - φ_λ(x))( φ_μ(y) - φ_μ(x)) dy} \right)^2 dx},$$ where $p(t,x,y)$ is the classical heat kernel and $e^{-t λ} + e^{-t μ} = 1$. This notion recovers all classical notions of duality but is equally applicable to other (rough) geometries and graphs; many numerical examples in different continuous and discrete settings illustrate the result. △ Less

Submitted 25 April, 2018; originally announced April 2018.

MSC Class: 35J05; 35P05; 42C10; 65T60; 81Q50; 94A11

arXiv:1803.10840 [pdf, other]

Defending against Adversarial Images using Basis Functions Transformations

Authors: Uri Shaham, James Garritano, Yutaro Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan Cheng, Kelly Stanton, Yuval Kluger

Abstract: We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray… ▽ More We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray and white-box settings. Our results show JPEG compression tends to outperform the other tested defenses in most of the settings considered, in addition to soft-thresholding, which performs well in specific cases, and yields a more mild decrease in accuracy on benign examples. In addition, we also mathematically derive a novel white-box attack in which the adversarial perturbation is composed only of terms corresponding a to pre-determined subset of the basis functions, of which a "low frequency attack" is a special case. △ Less

Submitted 16 April, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

Comments: added link to GitHub repository

arXiv:1709.05006 [pdf, other]

Two-sample Statistics Based on Anisotropic Kernels

Authors: Xiuyuan Cheng, Alexander Cloninger, Ronald R. Coifman

Abstract: The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kerne… ▽ More The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $n$ data points and a set of $n_R$ reference points, where $n_R$ can be drastically smaller than $n$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $\|p-q\| \sqrt{n} \to \infty $, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions. △ Less

Submitted 30 August, 2018; v1 submitted 14 September, 2017; originally announced September 2017.

arXiv:1608.04418 [pdf, other]

A Note on Markov Normalized Magnetic Eigenmaps

Authors: Alexander Cloninger

Abstract: We note that building a magnetic Laplacian from the Markov transition matrix, rather than the graph adjacency matrix, yields several benefits for the magnetic eigenmaps algorithm. The two largest benefits are that the embedding becomes more stable as a function of the rotation parameter g, and the principal eigenvector of the magnetic Laplacian now converges to the page rank of the network as a fu… ▽ More We note that building a magnetic Laplacian from the Markov transition matrix, rather than the graph adjacency matrix, yields several benefits for the magnetic eigenmaps algorithm. The two largest benefits are that the embedding becomes more stable as a function of the rotation parameter g, and the principal eigenvector of the magnetic Laplacian now converges to the page rank of the network as a function of diffusion time. We show empirically that this normalization improves the phase and real/imaginary embeddings of the low-frequency eigenvectors of the magnetic Laplacian. △ Less

Submitted 3 November, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

arXiv:1606.00931 [pdf, other]

doi 10.1186/s12874-018-0482-1

DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network

Authors: Jared Katzman, Uri Shaham, Jonathan Bates, Alexander Cloninger, Tingting Jiang, Yuval Kluger

Abstract: Medical practitioners use survival models to explore and understand the relationships between patients' covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. Whil… ▽ More Medical practitioners use survival models to explore and understand the relationships between patients' covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. While nonlinear survival methods, such as neural networks and survival forests, can inherently model these high-level interaction terms, they have yet to be shown as effective treatment recommender systems. We introduce DeepSurv, a Cox proportional hazards deep neural network and state-of-the-art survival method for modeling interactions between a patient's covariates and treatment effectiveness in order to provide personalized treatment recommendations. We perform a number of experiments training DeepSurv on simulated and real survival data. We demonstrate that DeepSurv performs as well as or better than other state-of-the-art survival models and validate that DeepSurv successfully models increasingly complex relationships between a patient's covariates and their risk of failure. We then show how DeepSurv models the relationship between a patient's features and effectiveness of different treatment options to show how DeepSurv can be used to provide individual treatment recommendations. Finally, we train DeepSurv on real clinical studies to demonstrate how it's personalized treatment recommendations would increase the survival time of a set of patients. The predictive and modeling capabilities of DeepSurv will enable medical researchers to use deep neural networks as a tool in their exploration, understanding, and prediction of the effects of a patient's characteristics on their risk of failure. △ Less

Submitted 8 August, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

Comments: Presented at the International Conference of Machine Learning Computational Biology Workshop 2016

arXiv:1509.07385 [pdf, other]

doi 10.1016/j.acha.2016.04.003

Provable approximation properties for deep neural networks

Authors: Uri Shaham, Alexander Cloninger, Ronald R. Coifman

Abstract: We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $Γ\subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $Γ$, the complexity of $f$, in terms of its wavelet description, and only weakly on the… ▽ More We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $Γ\subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $Γ$, the complexity of $f$, in terms of its wavelet description, and only weakly on the ambient dimension $m$. Essentially, our network computes wavelet functions, which are computed from Rectified Linear Units (ReLU) △ Less

Submitted 28 March, 2016; v1 submitted 24 September, 2015; originally announced September 2015.

Comments: accepted for publication in Applied and Computational Harmonic Analysis

arXiv:1507.00220 [pdf, other]

Bigeometric Organization of Deep Nets

Authors: Alexander Cloninger, Ronald R. Coifman, Nicholas Downing, Harlan M. Krumholz

Abstract: In this paper, we build an organization of high-dimensional datasets that cannot be cleanly embedded into a low-dimensional representation due to missing entries and a subset of the features being irrelevant to modeling functions of interest. Our algorithm begins by defining coarse neighborhoods of the points and defining an expected empirical function value on these neighborhoods. We then generat… ▽ More In this paper, we build an organization of high-dimensional datasets that cannot be cleanly embedded into a low-dimensional representation due to missing entries and a subset of the features being irrelevant to modeling functions of interest. Our algorithm begins by defining coarse neighborhoods of the points and defining an expected empirical function value on these neighborhoods. We then generate new non-linear features with deep net representations tuned to model the approximate function, and re-organize the geometry of the points with respect to the new representation. Finally, the points are locally z-scored to create an intrinsic geometric organization which is independent of the parameters of the deep net, a geometry designed to assure smoothness with respect to the empirical function. We examine this approach on data from the Center for Medicare and Medicaid Services Hospital Quality Initiative, and generate an intrinsic low-dimensional organization of the hospitals that is smooth with respect to an expert driven function of quality. △ Less

Submitted 1 July, 2015; originally announced July 2015.

arXiv:1506.07840 [pdf, other]

Diffusion Nets

Authors: Gal Mishne, Uri Shaham, Alexander Cloninger, Israel Cohen

Abstract: Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset and its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Sta… ▽ More Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset and its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Stacking the encoder and decoder together constructs an autoencoder, which we term a diffusion net, that performs out-of-sample-extension as well as outlier detection. We introduce new neural net constraints for the encoder, which preserves the local geometry of the points, and we prove rates of convergence for the encoder. Also, our approach is efficient in both computational complexity and memory requirements, as opposed to previous methods that require storage of all training points in both the high-dimensional and the low-dimensional spaces to calculate the out-of-sample-extension and the pre-image. △ Less

Submitted 25 June, 2015; originally announced June 2015.

Comments: 24 pages, 12 figures

Showing 1–36 of 36 results for author: Cloninger, A