-
Support vector machines and Radon's theorem
Authors:
Henry Adams,
Elin Farnell,
Brittany Story
Abstract:
A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divid…
▽ More
A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the convex hulls of the positive and negative support vectors are projected onto a separating hyperplane, then the projections intersect if and only if the hyperplane is optimal. Further, with a particular type of general position, we show that (a) the projected convex hulls of the support vectors intersect in exactly one point, (b) the support vectors are stable under perturbation, (c) there are at most $n+1$ support vectors, and (d) every number of support vectors from 2 up to $n+1$ is possible. Finally, we perform computer simulations studying the expected number of support vectors, and their configurations, for randomly generated data. We observe that as the distance between classes of points increases for this type of randomly generated data, configurations with fewer support vectors become more likely.
△ Less
Submitted 16 September, 2022; v1 submitted 1 November, 2020;
originally announced November 2020.
-
More chemical detection through less sampling: amplifying chemical signals in hyperspectral data cubes through compressive sensing
Authors:
Henry Kvinge,
Elin Farnell,
Julia R. Dupuis,
Michael Kirby,
Chris Peterson,
Elizabeth C. Schundler
Abstract:
Compressive sensing (CS) is a method of sampling which permits some classes of signals to be reconstructed with high accuracy even when they were under-sampled. In this paper we explore a phenomenon in which bandwise CS sampling of a hyperspectral data cube followed by reconstruction can actually result in amplification of chemical signals contained in the cube. Perhaps most surprisingly, chemical…
▽ More
Compressive sensing (CS) is a method of sampling which permits some classes of signals to be reconstructed with high accuracy even when they were under-sampled. In this paper we explore a phenomenon in which bandwise CS sampling of a hyperspectral data cube followed by reconstruction can actually result in amplification of chemical signals contained in the cube. Perhaps most surprisingly, chemical signal amplification generally seems to increase as the level of sampling decreases. In some examples, the chemical signal is significantly stronger in a data cube reconstructed from 10% CS sampling than it is in the raw, 100% sampled data cube. We explore this phenomenon in two real-world datasets including the Physical Sciences Inc. Fabry-Pérot interferometer sensor multispectral dataset and the Johns Hopkins Applied Physics Lab FTIR-based longwave infrared sensor hyperspectral dataset. Each of these datasets contains the release of a chemical simulant, such as glacial acetic acid, triethyl phospate, and sulfur hexafluoride, and in all cases we use the adaptive coherence estimator (ACE) to detect a target signal in the hyperspectral data cube. We end the paper by suggesting some theoretical justifications for why chemical signals would be amplified in CS sampled and reconstructed hyperspectral data cubes and discuss some practical implications.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Total variation vs L1 regularization: a comparison of compressive sensing optimization methods for chemical detection
Authors:
Elin Farnell,
Henry Kvinge,
Julia R. Dupuis,
Michael Kirby,
Chris Peterson,
Elizabeth C. Schundler
Abstract:
One of the fundamental assumptions of compressive sensing (CS) is that a signal can be reconstructed from a small number of samples by solving an optimization problem with the appropriate regularization term. Two standard regularization terms are the L1 norm and the total variation (TV) norm. We present a comparison of CS reconstruction results based on these two approaches in the context of chemi…
▽ More
One of the fundamental assumptions of compressive sensing (CS) is that a signal can be reconstructed from a small number of samples by solving an optimization problem with the appropriate regularization term. Two standard regularization terms are the L1 norm and the total variation (TV) norm. We present a comparison of CS reconstruction results based on these two approaches in the context of chemical detection, and we demonstrate that optimization based on the L1 norm outperforms optimization based on the TV norm. Our comparison is driven by CS sampling, reconstruction, and chemical detection in two real-world datasets: the Physical Sciences Inc. Fabry-Pérot interferometer sensor multispectral dataset and the Johns Hopkins Applied Physics Lab FTIR-based longwave infrared sensor hyperspectral dataset. Both datasets contain the release of a chemical simulant such as glacial acetic acid, triethyl phosphate, and sulfur hexafluoride. For chemical detection we use the adaptive coherence estimator (ACE) and bulk coherence, and we propose algorithmic ACE thresholds to define the presence or absence of a chemical of interest in both un-compressed data cubes and reconstructed data cubes. The un-compressed data cubes provide an approximate ground truth. We demonstrate that optimization based on either the L1 norm or TV norm results in successful chemical detection at a compression rate of 90%, but we show that L1 optimization is preferable. We present quantitative comparisons of chemical detection on reconstructions from the two methods, with an emphasis on the number of pixels with an ACE value above the threshold.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
A data-driven approach to sampling matrix selection for compressive sensing
Authors:
Elin Farnell,
Henry Kvinge,
John P. Dixon,
Julia R. Dupuis,
Michael Kirby,
Chris Peterson,
Elizabeth C. Schundler,
Christian W. Smith
Abstract:
Sampling is a fundamental aspect of any implementation of compressive sensing. Typically, the choice of sampling method is guided by the reconstruction basis. However, this approach can be problematic with respect to certain hardware constraints and is not responsive to domain-specific context. We propose a method for defining an order for a sampling basis that is optimal with respect to capturing…
▽ More
Sampling is a fundamental aspect of any implementation of compressive sensing. Typically, the choice of sampling method is guided by the reconstruction basis. However, this approach can be problematic with respect to certain hardware constraints and is not responsive to domain-specific context. We propose a method for defining an order for a sampling basis that is optimal with respect to capturing variance in data, thus allowing for meaningful sensing at any desired level of compression. We focus on the Walsh-Hadamard sampling basis for its relevance to hardware constraints, but our approach applies to any sampling basis of interest. We illustrate the effectiveness of our method on the Physical Sciences Inc. Fabry-Pérot interferometer sensor multispectral dataset, the Johns Hopkins Applied Physics Lab FTIR-based longwave infrared sensor hyperspectral dataset, and a Colorado State University Swiss Ranger depth image dataset. The spectral datasets consist of simulant experiments, including releases of chemicals such as GAA and SF6. We combine our sampling and reconstruction with the adaptive coherence estimator (ACE) and bulk coherence for chemical detection and we incorporate an algorithmic threshold for ACE values to determine the presence or absence of a chemical. We compare results across sampling methods in this context. We have successful chemical detection at a compression rate of 90%. For all three datasets, we compare our sampling approach to standard orderings of sampling basis such as random, sequency, and an analog of sequency that we term `frequency.' In one instance, the peak signal to noise ratio was improved by over 30% across a test set of depth images.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Rare geometries: revealing rare categories via dimension-driven statistics
Authors:
Henry Kvinge,
Elin Farnell,
**gya Li,
Yujia Chen
Abstract:
In many situations, classes of data points of primary interest also happen to be those that are least numerous. A well-known example is detection of fraudulent transactions among the collection of all financial transactions, the vast majority of which are legitimate. These types of problems fall under the label of `rare-category detection.' There are two challenging aspects of these problems. The…
▽ More
In many situations, classes of data points of primary interest also happen to be those that are least numerous. A well-known example is detection of fraudulent transactions among the collection of all financial transactions, the vast majority of which are legitimate. These types of problems fall under the label of `rare-category detection.' There are two challenging aspects of these problems. The first is a general lack of labeled examples of the rare class and the second is the potential non-separability of the rare class from the majority (in terms of available features). Statistics related to the geometry of the rare class (such as its intrinsic dimension) can be significantly different from those for the majority class, reflecting the different dynamics driving variation in the different classes. In this paper we present a new supervised learning algorithm that uses a dimension-driven statistic, called the kappa-profile, to classify whether unlabeled points belong to a rare class. Our algorithm requires very few labeled examples and is invariant with respect to translation so that it performs equivalently on both separable and non-separable classes.
△ Less
Submitted 28 May, 2019; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets
Authors:
Henry Kvinge,
Elin Farnell,
Michael Kirby,
Chris Peterson
Abstract:
Dimensionality-reduction methods are a fundamental tool in the analysis of large data sets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it is collected. Alongside their usual purpose of map** data into a smaller dimension with minimal information loss, dimensionality-reduction techniques implicit…
▽ More
Dimensionality-reduction methods are a fundamental tool in the analysis of large data sets. These algorithms work on the assumption that the "intrinsic dimension" of the data is generally much smaller than the ambient dimension in which it is collected. Alongside their usual purpose of map** data into a smaller dimension with minimal information loss, dimensionality-reduction techniques implicitly or explicitly provide information about the dimension of the data set.
In this paper, we propose a new statistic that we call the $κ$-profile for analysis of large data sets. The $κ$-profile arises from a dimensionality-reduction optimization problem: namely that of finding a projection into $k$-dimensions that optimally preserves the secants between points in the data set. From this optimal projection we extract $κ,$ the norm of the shortest projected secant from among the set of all normalized secants. This $κ$ can be computed for any $k$; thus the tuple of $κ$ values (indexed by dimension) becomes a $κ$-profile. Algorithms such as the Secant-Avoidance Projection algorithm and the Hierarchical Secant-Avoidance Projection algorithm, provide a computationally feasible means of estimating the $κ$-profile for large data sets, and thus a method of understanding and monitoring their behavior. As we demonstrate in this paper, the $κ$-profile serves as a useful statistic in several representative settings: weather data, soundscape data, and dynamical systems data.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets
Authors:
Henry Kvinge,
Elin Farnell,
Michael Kirby,
Chris Peterson
Abstract:
A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algo…
▽ More
A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with $T$ points has $O(T^2)$ secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.
△ Less
Submitted 5 August, 2018;
originally announced August 2018.
-
A fractal dimension for measures via persistent homology
Authors:
Henry Adams,
Manuchehr Aminian,
Elin Farnell,
Michael Kirby,
Chris Peterson,
Joshua Mirth,
Rachel Neville,
Patrick Shipman,
Clayton Shonkwiler
Abstract:
We use persistent homology in order to define a family of fractal dimensions, denoted $\mathrm{dim}_{\mathrm{PH}}^i(μ)$ for each homological dimension $i\ge 0$, assigned to a probability measure $μ$ on a metric space. The case of $0$-dimensional homology ($i=0$) relates to work by Michael J Steele (1988) studying the total length of a minimal spanning tree on a random sampling of points. Indeed, i…
▽ More
We use persistent homology in order to define a family of fractal dimensions, denoted $\mathrm{dim}_{\mathrm{PH}}^i(μ)$ for each homological dimension $i\ge 0$, assigned to a probability measure $μ$ on a metric space. The case of $0$-dimensional homology ($i=0$) relates to work by Michael J Steele (1988) studying the total length of a minimal spanning tree on a random sampling of points. Indeed, if $μ$ is supported on a compact subset of Euclidean space $\mathbb{R}^m$ for $m\ge2$, then Steele's work implies that $\mathrm{dim}_{\mathrm{PH}}^0(μ)=m$ if the absolutely continuous part of $μ$ has positive mass, and otherwise $\mathrm{dim}_{\mathrm{PH}}^0(μ)<m$. Experiments suggest that similar results may be true for higher-dimensional homology $0<i<m$, though this is an open question. Our fractal dimension is defined by considering a limit, as the number of points $n$ goes to infinity, of the total sum of the $i$-dimensional persistent homology interval lengths for $n$ random points selected from $μ$ in an i.i.d. fashion. To some measures $μ,$ we are able to assign a finer invariant, a curve measuring the limiting distribution of persistent homology interval lengths as the number of points goes to infinity. We prove this limiting curve exists in the case of $0$-dimensional homology when $μ$ is the uniform distribution over the unit interval, and conjecture that it exists when $μ$ is the rescaled probability measure for a compact set in Euclidean space with positive Lebesgue measure.
△ Less
Submitted 30 January, 2019; v1 submitted 2 August, 2018;
originally announced August 2018.
-
A GPU-Oriented Algorithm Design for Secant-Based Dimensionality Reduction
Authors:
Henry Kvinge,
Elin Farnell,
Michael Kirby,
Chris Peterson
Abstract:
Dimensionality-reduction techniques are a fundamental tool for extracting useful information from high-dimensional data sets. Because secant sets encode manifold geometry, they are a useful tool for designing meaningful data-reduction algorithms. In one such approach, the goal is to construct a projection that maximally avoids secant directions and hence ensures that distinct data points are not m…
▽ More
Dimensionality-reduction techniques are a fundamental tool for extracting useful information from high-dimensional data sets. Because secant sets encode manifold geometry, they are a useful tool for designing meaningful data-reduction algorithms. In one such approach, the goal is to construct a projection that maximally avoids secant directions and hence ensures that distinct data points are not mapped too close together in the reduced space. This type of algorithm is based on a mathematical framework inspired by the constructive proof of Whitney's embedding theorem from differential topology. Computing all (unit) secants for a set of points is by nature computationally expensive, thus opening the door for exploitation of GPU architecture for achieving fast versions of these algorithms. We present a polynomial-time data-reduction algorithm that produces a meaningful low-dimensional representation of a data set by iteratively constructing improved projections within the framework described above. Key to our algorithm design and implementation is the use of GPUs which, among other things, minimizes the computational time required for the calculation of all secant lines. One goal of this report is to share ideas with GPU experts and to discuss a class of mathematical algorithms that may be of interest to the broader GPU community.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Endmember Extraction on the Grassmannian
Authors:
Elin Farnell,
Henry Kvinge,
Michael Kirby,
Chris Peterson
Abstract:
Endmember extraction plays a prominent role in a variety of data analysis problems as endmembers often correspond to data representing the purest or best representative of some feature. Identifying endmembers then can be useful for further identification and classification tasks. In settings with high-dimensional data, such as hyperspectral imagery, it can be useful to consider endmembers that are…
▽ More
Endmember extraction plays a prominent role in a variety of data analysis problems as endmembers often correspond to data representing the purest or best representative of some feature. Identifying endmembers then can be useful for further identification and classification tasks. In settings with high-dimensional data, such as hyperspectral imagery, it can be useful to consider endmembers that are subspaces as they are capable of capturing a wider range of variations of a signature. The endmember extraction problem in this setting thus translates to finding the vertices of the convex hull of a set of points on a Grassmannian. In the presence of noise, it can be less clear whether a point should be considered a vertex. In this paper, we propose an algorithm to extract endmembers on a Grassmannian, identify subspaces of interest that lie near the boundary of a convex hull, and demonstrate the use of the algorithm on a synthetic example and on the 220 spectral band AVIRIS Indian Pines hyperspectral image.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.