-
Asymptotic symmetry and group invariance for randomization
Authors:
Adam B Kashlak
Abstract:
Symmetry is a cornerstone of much of mathematics, and many probability distributions possess symmetries characterized by their invariance to a collection of group actions. Thus, many mathematical and statistical methods rely on such symmetry holding and ostensibly fail if symmetry is broken. This work considers under what conditions a sequence of probability measures asymptotically gains such symm…
▽ More
Symmetry is a cornerstone of much of mathematics, and many probability distributions possess symmetries characterized by their invariance to a collection of group actions. Thus, many mathematical and statistical methods rely on such symmetry holding and ostensibly fail if symmetry is broken. This work considers under what conditions a sequence of probability measures asymptotically gains such symmetry or invariance to a collection of group actions. Considering the many symmetries of the Gaussian distribution, this work effectively proposes a non-parametric type of central limit theorem. That is, a Lipschitz function of a high dimensional random vector will be asymptotically invariant to the actions of certain compact topological groups. Applications of this include a partial law of the iterated logarithm for uniformly random points in an $\ell_p^n$-ball and an asymptotic equivalence between classical parametric statistical tests and their randomization counterparts even when invariance assumptions are violated.
△ Less
Submitted 20 October, 2023; v1 submitted 31 October, 2022;
originally announced November 2022.
-
Topological Hidden Markov Models
Authors:
Adam B Kashlak,
Prachi Loliencar,
Giseon Heo
Abstract:
The hidden Markov model (HMM) is a classic modeling tool with a wide swath of applications. Its inception considered observations restricted to a finite alphabet, but it was quickly extended to multivariate continuous distributions. In this article, we further extend the HMM from mixtures of normal distributions in $d$-dimensional Euclidean space to general Gaussian measure mixtures in locally con…
▽ More
The hidden Markov model (HMM) is a classic modeling tool with a wide swath of applications. Its inception considered observations restricted to a finite alphabet, but it was quickly extended to multivariate continuous distributions. In this article, we further extend the HMM from mixtures of normal distributions in $d$-dimensional Euclidean space to general Gaussian measure mixtures in locally convex topological spaces. The main innovation is the use of the Onsager-Machlup functional as a proxy for the probability density function in infinite dimensional spaces. This allows for choice of a Cameron-Martin space suitable for a given application. We demonstrate the versatility of this methodology by applying it to simulated diffusion processes such as Brownian and fractional Brownian sample paths as well as the Ornstein-Uhlenbeck process. Our methodology is applied to the identification of sleep states from overnight polysomnography time series data with the aim of diagnosing Obstructive Sleep Apnea in pediatric patients. It is also applied to a series of annual cumulative snowfall curves from 1940 to 1990 in the city of Edmonton, Alberta.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Local Statistics for Spatial Panel Models with Application to the US Electorate
Authors:
Jianfeng Wang,
Adam B Kashlak
Abstract:
The spatial panel regression model has shown great success in modelling econometric and other types of data that are observed both spatially and temporally with associated predictor variables. However, model checking via testing for spatial correlations in spatial-temporal residuals is still lacking. We propose a general methodology for fast permutation testing of local and global indicators of sp…
▽ More
The spatial panel regression model has shown great success in modelling econometric and other types of data that are observed both spatially and temporally with associated predictor variables. However, model checking via testing for spatial correlations in spatial-temporal residuals is still lacking. We propose a general methodology for fast permutation testing of local and global indicators of spatial association. This methodology extends past statistics for univariate spatial data that can be written as a gamma index for matrix similarity to the multivariate and panel data settings. This includes Moran's $I$ and Geary's $C$ among others. Spatial panel models are fit and our methodology is tested on county-wise electoral results for the five US presidential elections from 2000 to 2016 inclusive. County-wise exongenous predictor variables included in this analysis are voter population density, median income, and percentage of the population that is non-Hispanic white.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
A reproducing kernel Hilbert space framework for functional data classification
Authors:
Peijun Sang,
Adam B Kashlak,
Linglong Kong
Abstract:
We encounter a bottleneck when we try to borrow the strength of classical classifiers to classify functional data. The major issue is that functional data are intrinsically infinite dimensional, thus classical classifiers cannot be applied directly or have poor performance due to the curse of dimensionality. To address this concern, we propose to project functional data onto one specific direction…
▽ More
We encounter a bottleneck when we try to borrow the strength of classical classifiers to classify functional data. The major issue is that functional data are intrinsically infinite dimensional, thus classical classifiers cannot be applied directly or have poor performance due to the curse of dimensionality. To address this concern, we propose to project functional data onto one specific direction, and then a distance-weighted discrimination DWD classifier is built upon the projection score. The projection direction is identified through minimizing an empirical risk function that contains the particular loss function in a DWD classifier, over a reproducing kernel Hilbert space. Hence our proposed classifier can avoid overfitting and enjoy appealing properties of DWD classifiers. This framework is further extended to accommodate functional data classification problems where scalar covariates are involved. In contrast to previous work, we establish a non-asymptotic estimation error bound on the relative misclassification rate. In finite sample case, we demonstrate that the proposed classifiers compare favorably with some commonly used functional classifiers in terms of prediction accuracy through simulation studies and a real-world application.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Computation-free Nonparametric testing for Local and Global Spatial Autocorrelation with application to the Canadian Electorate
Authors:
Adam B Kashlak,
Weicong Yuan
Abstract:
Measures of local and global spatial association are key tools for exploratory spatial data analysis. Many such measures exist including Moran's $I$, Geary's $C$, and the Getis-Ord $G$ and $G^*$ statistics. A parametric approach to testing for significance relies on strong assumptions, which are often not met by real world data. Alternatively, the most popular nonparametric approach, the permutati…
▽ More
Measures of local and global spatial association are key tools for exploratory spatial data analysis. Many such measures exist including Moran's $I$, Geary's $C$, and the Getis-Ord $G$ and $G^*$ statistics. A parametric approach to testing for significance relies on strong assumptions, which are often not met by real world data. Alternatively, the most popular nonparametric approach, the permutation test, imposes a large computational burden especially for massive graphical networks. Hence, we propose a computation-free approach to nonparametric permutation testing for local and global measures of spatial autocorrelation stemming from generalizations of the Khintchine inequality from functional analysis and the theory of $L^p$ spaces. Our methodology is demonstrated on the results of the 2019 federal Canadian election in the province of Alberta. We recorded the percentage of the vote gained by the conservative candidate in each riding. This data is not normal, and the sample size is fixed at $n=34$ ridings making the parametric approach invalid. In contrast, running a classic permutation test for every riding, for multiple test statistics, with various neighbourhood structures, and multiple testing correction would require the simulation of millions of permutations. We are able to achieve similar statistical power on this dataset to the permutation test without the need for tedious simulation. We also consider data simulated across the entire electoral map of Canada.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Functional Response Designs via the Analytic Permutation Test
Authors:
Adam B Kashlak,
Sergii Myroshnychenko,
Susanna Spektor
Abstract:
Vast literature on experimental design extends from Fisher and Snedecor to the modern day. When data lies beyond the assumption of univariate normality, nonparametric methods including rank based statistics and permutation tests are enlisted. The permutation test is a versatile exact nonparametric significance test that requires drastically fewer assumptions than similar parametric tests. The main…
▽ More
Vast literature on experimental design extends from Fisher and Snedecor to the modern day. When data lies beyond the assumption of univariate normality, nonparametric methods including rank based statistics and permutation tests are enlisted. The permutation test is a versatile exact nonparametric significance test that requires drastically fewer assumptions than similar parametric tests. The main downfall of the permutation test is high computational cost making this approach laborious for complex data and sophisticated experimental designs and completely infeasible in any application requiring speedy results such as high throughput streaming data. We rectify this problem through application of concentration inequalities and thus propose a computation free permutation test -- i.e. a permutation-less permutation test. This general framework is applied to multivariate, matrix-valued, and functional data. We improve these concentration bounds via a novel incomplete beta transform. We extend our theory from 2-sample to $k$-sample testing through the use of weakly dependent Rademacher chaoses and modified decoupling inequalities. We test this methodology on classic functional data sets including the Berkeley growth curves and the phoneme dataset. We further consider analysis of spoken vowel sound under two experimental designs: the Latin square and the randomized block design.
△ Less
Submitted 10 November, 2021; v1 submitted 4 January, 2020;
originally announced January 2020.
-
Diagnosis of Pediatric Obstructive Sleep Apnea via Face Classification with Persistent Homology and Convolutional Neural Networks
Authors:
Milad Kiaee,
Adam B Kashlak,
Jisu Kim,
Giseon Heo
Abstract:
Obstructive sleep apnea is a serious condition causing a litany of health problems especially in the pediatric population. However, this chronic condition can be treated if diagnosis is possible. The gold standard for diagnosis is an overnight sleep study, which is often unobtainable by many potentially suffering from this condition. Hence, we attempt to develop a fast non-invasive diagnostic tool…
▽ More
Obstructive sleep apnea is a serious condition causing a litany of health problems especially in the pediatric population. However, this chronic condition can be treated if diagnosis is possible. The gold standard for diagnosis is an overnight sleep study, which is often unobtainable by many potentially suffering from this condition. Hence, we attempt to develop a fast non-invasive diagnostic tool by training a classifier on 2D and 3D facial images of a patient to recognize facial features associated with obstructive sleep apnea. In this comparative study, we consider both persistent homology and geometric shape analysis from the field of computational topology as well as convolutional neural networks, a powerful method from deep learning whose success in image and specifically facial recognition has already been demonstrated by computer scientists.
△ Less
Submitted 25 October, 2019;
originally announced November 2019.
-
Non-asymptotic error controlled sparse high dimensional precision matrix estimation
Authors:
Adam B Kashlak
Abstract:
Estimation of a high dimensional precision matrix is a critical problem to many areas of statistics including Gaussian graphical models and inference on high dimensional data. Working under the structural assumption of sparsity, we propose a novel methodology for estimating such matrices while controlling the false positive rate, percentage of matrix entries incorrectly chosen to be non-zero. We s…
▽ More
Estimation of a high dimensional precision matrix is a critical problem to many areas of statistics including Gaussian graphical models and inference on high dimensional data. Working under the structural assumption of sparsity, we propose a novel methodology for estimating such matrices while controlling the false positive rate, percentage of matrix entries incorrectly chosen to be non-zero. We specifically focus on false positive rates tending towards zero with finite sample guarantees. This methodology is distribution free, but is particularly applicable to the problem of Gaussian network recovery. We also consider applications to constructing gene networks in genomics data.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Nonasymptotic estimation and support recovery for high dimensional sparse covariance matrices
Authors:
Adam B Kashlak,
Linglong Kong
Abstract:
We propose a general framework for nonasymptotic covariance matrix estimation making use of concentration inequality-based confidence sets. We specify this framework for the estimation of large sparse covariance matrices through incorporation of past thresholding estimators with key emphasis on support recovery. This technique goes beyond past results for thresholding estimators by allowing for a…
▽ More
We propose a general framework for nonasymptotic covariance matrix estimation making use of concentration inequality-based confidence sets. We specify this framework for the estimation of large sparse covariance matrices through incorporation of past thresholding estimators with key emphasis on support recovery. This technique goes beyond past results for thresholding estimators by allowing for a wide range of distributional assumptions beyond merely sub-Gaussian tails. This methodology can furthermore be adapted to a wide range of other estimators and settings. The usage of nonasymptotic dimension-free confidence sets yields good theoretical performance. Through extensive simulations, it is demonstrated to have superior performance when compared with other such methods. In the context of support recovery, we are able to specify a false positive rate and optimize to maximize the true recoveries.
△ Less
Submitted 26 March, 2019; v1 submitted 7 May, 2017;
originally announced May 2017.
-
Improved Rademacher symmetrization through a Wasserstein based measure of asymmetry
Authors:
Adam B Kashlak
Abstract:
We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the symmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequalit…
▽ More
We propose of an improved version of the ubiquitous symmetrization inequality making use of the Wasserstein distance between a measure and its reflection in order to quantify the symmetry of the given measure. An empirical bound on this asymmetric correction term is derived through a bootstrap procedure and shown to give tighter results in practical settings than the original uncorrected inequality. Lastly, a wide range of applications are detailed including testing for data symmetry, constructing nonasymptotic high dimensional confidence sets, bounding the variance of an empirical process, and improving constants in Nemirovski style inequalities for Banach space valued random variables.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
Markov models for ocular fixation locations in the presence and absence of colour
Authors:
Adam B. Kashlak,
Eoin Devane,
Helge Dietert,
Henry Jackson
Abstract:
We propose to model the fixation locations of the human eye when observing a still image by a Markovian point process in R 2 . Our approach is data driven using k-means clustering of the fixation locations to identify distinct salient regions of the image, which in turn correspond to the states of our Markov chain. Bayes factors are computed as model selection criterion to determine the number of…
▽ More
We propose to model the fixation locations of the human eye when observing a still image by a Markovian point process in R 2 . Our approach is data driven using k-means clustering of the fixation locations to identify distinct salient regions of the image, which in turn correspond to the states of our Markov chain. Bayes factors are computed as model selection criterion to determine the number of clusters. Furthermore, we demonstrate that the behaviour of the human eye differs from this model when colour information is removed from the given image.
△ Less
Submitted 21 April, 2016;
originally announced April 2016.
-
Inference on covariance operators via concentration inequalities: k-sample tests, classification, and clustering via Rademacher complexities
Authors:
Adam B. Kashlak,
John A. D. Aston,
Richard Nickl
Abstract:
We propose a novel approach to the analysis of covariance operators making use of concentration inequalities. First, non-asymptotic confidence sets are constructed for such operators. Then, subsequent applications including a k sample test for equality of covariance, a functional data classifier, and an expectation-maximization style clustering algorithm are derived and tested on both simulated an…
▽ More
We propose a novel approach to the analysis of covariance operators making use of concentration inequalities. First, non-asymptotic confidence sets are constructed for such operators. Then, subsequent applications including a k sample test for equality of covariance, a functional data classifier, and an expectation-maximization style clustering algorithm are derived and tested on both simulated and phoneme data.
△ Less
Submitted 21 April, 2016;
originally announced April 2016.