-
EM Estimation of the B-spline Copula with Penalized Log-Likelihood Function
Authors:
Xiaoling Dou,
Satoshi Kuriki,
Gwo Dong Lin,
Donald Richards
Abstract:
The B-spline copula function is defined by a linear combination of elements of the normalized B-spline basis. We develop a modified EM algorithm, to maximize the penalized log-likelihood function, wherein we use the smoothly clipped absolute deviation (SCAD) penalty function for the penalization term. We conduct simulation studies to demonstrate the stability of the proposed numerical procedure, s…
▽ More
The B-spline copula function is defined by a linear combination of elements of the normalized B-spline basis. We develop a modified EM algorithm, to maximize the penalized log-likelihood function, wherein we use the smoothly clipped absolute deviation (SCAD) penalty function for the penalization term. We conduct simulation studies to demonstrate the stability of the proposed numerical procedure, show that penalization yields estimates with smaller mean-square errors when the true parameter matrix is sparse, and provide methods for determining tuning parameters and for model selection. We analyze as an example a data set consisting of birth and death rates from 237 countries, available at the website, ''Our World in Data,'' and we estimate the marginal density and distribution functions of those rates together with all parameters of our B-spline copula model.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Robust Topological Inference in the Presence of Outliers
Authors:
Siddharth Vishwanath,
Bharath K. Sriperumbudur,
Kenji Fukumizu,
Satoshi Kuriki
Abstract:
The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this w…
▽ More
The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a $\textit{median-of-means}$ variant of the distance function ($\textsf{MoM Dist}$), and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by $\textsf{MoM Dist}$ are both consistent estimators of the true underlying population counterpart, and their rates of convergence in the bottleneck metric are controlled by the fraction of outliers in the data. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
The volume-of-tube method for Gaussian random fields with inhomogeneous variance
Authors:
Satoshi Kuriki,
Akimichi Takemura,
Jonathan E. Taylor
Abstract:
The tube method or the volume-of-tube method approximates the tail probability of the maximum of a smooth Gaussian random field with zero mean and unit variance. This method evaluates the volume of a spherical tube about the index set, and then transforms it to the tail probability. In this study, we generalize the tube method to a case in which the variance is not constant. We provide the volume…
▽ More
The tube method or the volume-of-tube method approximates the tail probability of the maximum of a smooth Gaussian random field with zero mean and unit variance. This method evaluates the volume of a spherical tube about the index set, and then transforms it to the tail probability. In this study, we generalize the tube method to a case in which the variance is not constant. We provide the volume formula for a spherical tube with a non-constant radius in terms of curvature tensors, and the tail probability formula of the maximum of a Gaussian random field with inhomogeneous variance, as well as its Laplace approximation. In particular, the critical radius of the tube is generalized for evaluation of the asymptotic approximation error. As an example, we discuss the approximation of the largest eigenvalue distribution of the Wishart matrix with a non-identity matrix parameter. The Bonferroni method is the tube method when the index set is a finite set. We provide the formula for the asymptotic approximation error for the Bonferroni method when the variance is not constant.
△ Less
Submitted 9 September, 2021; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Robust Persistence Diagrams using Reproducing Kernels
Authors:
Siddharth Vishwanath,
Kenji Fukumizu,
Satoshi Kuriki,
Bharath Sriperumbudur
Abstract:
Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtra…
▽ More
Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtrations of robust density estimators constructed using reproducing kernels. Using an analogue of the influence function on the space of persistence diagrams, we establish the proposed framework to be less sensitive to outliers. The robust persistence diagrams are shown to be consistent estimators in bottleneck distance, with the convergence rate controlled by the smoothness of the kernel. This, in turn, allows us to construct uniform confidence bands in the space of persistence diagrams. Finally, we demonstrate the superiority of the proposed approach on benchmark datasets.
△ Less
Submitted 3 June, 2022; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Existence and Uniqueness of the Kronecker Covariance MLE
Authors:
Mathias Drton,
Satoshi Kuriki,
Peter Hoff
Abstract:
In matrix-valued datasets the sampled matrices often exhibit correlations among both their rows and their columns. A useful and parsimonious model of such dependence is the matrix normal model, in which the covariances among the elements of a random matrix are parameterized in terms of the Kronecker product of two covariance matrices, one representing row covariances and one representing column co…
▽ More
In matrix-valued datasets the sampled matrices often exhibit correlations among both their rows and their columns. A useful and parsimonious model of such dependence is the matrix normal model, in which the covariances among the elements of a random matrix are parameterized in terms of the Kronecker product of two covariance matrices, one representing row covariances and one representing column covariance. An appealing feature of such a matrix normal model is that the Kronecker covariance structure allows for standard likelihood inference even when only a very small number of data matrices is available. For instance, in some cases a likelihood ratio test of dependence may be performed with a sample size of one. However, more generally the sample size required to ensure boundedness of the matrix normal likelihood or the existence of a unique maximizer depends in a complicated way on the matrix dimensions. This motivates the study of how large a sample size is needed to ensure that maximum likelihood estimators exist, and exist uniquely with probability one. Our main result gives precise sample size thresholds in the paradigm where the number of rows and the number of columns of the data matrices differ by at most a factor of two. Our proof uses invariance properties that allow us to consider data matrices in canonical form, as obtained from the Kronecker canonical form for matrix pencils.
△ Less
Submitted 14 January, 2021; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Optimal experimental design that minimizes the width of simultaneous confidence bands
Authors:
Satoshi Kuriki,
Henry P. Wynn
Abstract:
We propose an optimal experimental design for a curvilinear regression model that minimizes the band-width of simultaneous confidence bands. Simultaneous confidence bands for curvilinear regression are constructed by evaluating the volume of a tube about a curve that is defined as a trajectory of a regression basis vector (Naiman, 1986). The proposed criterion is constructed based on the volume of…
▽ More
We propose an optimal experimental design for a curvilinear regression model that minimizes the band-width of simultaneous confidence bands. Simultaneous confidence bands for curvilinear regression are constructed by evaluating the volume of a tube about a curve that is defined as a trajectory of a regression basis vector (Naiman, 1986). The proposed criterion is constructed based on the volume of a tube, and the corresponding optimal design that minimizes the volume of tube is referred to as the tube-volume optimal (TV-optimal) design. For Fourier and weighted polynomial regressions, the problem is formalized as one of minimization over the cone of Hankel positive definite matrices, and the criterion to minimize is expressed as an elliptic integral. We show that the Möbius group keeps our problem invariant, and hence, minimization can be conducted over cross-sections of orbits. We demonstrate that for the weighted polynomial regression and the Fourier regression with three bases, the tube-volume optimal design forms an orbit of the Möbius group containing D-optimal designs as representative elements.
△ Less
Submitted 30 March, 2019; v1 submitted 13 April, 2017;
originally announced April 2017.
-
Use of spurious correlation for multiplicity adjustment
Authors:
Yoshiyuki Ninomiya,
Satoshi Kuriki,
Toshihiko Shiroishi,
Toyoyuki Takada
Abstract:
We consider one of the most basic multiple testing problems that compares expectations of multivariate data among several groups. As a test statistic, a conventional (approximate) $t$-statistic is considered, and we determine its rejection region using a common rejection limit. When there are unknown correlations among test statistics, the multiplicity adjusted $p$-values are dependent on the unkn…
▽ More
We consider one of the most basic multiple testing problems that compares expectations of multivariate data among several groups. As a test statistic, a conventional (approximate) $t$-statistic is considered, and we determine its rejection region using a common rejection limit. When there are unknown correlations among test statistics, the multiplicity adjusted $p$-values are dependent on the unknown correlations. They are usually replaced with their estimates that are always consistent under any hypothesis. In this paper, we propose the use of estimates, which are not necessarily consistent and are referred to as spurious correlations, in order to improve statistical power. Through simulation studies, we verify that the proposed method asymptotically controls the family-wise error rate and clearly provides higher statistical power than existing methods. In addition, the proposed and existing methods are applied to a real multiple testing problem that compares quantitative traits among groups of mice and the results are compared.
△ Less
Submitted 18 December, 2016;
originally announced December 2016.
-
Recursive computation for evaluating the exact $p$-values of temporal and spatial scan statistics
Authors:
Satoshi Kuriki,
Kunihiko Takahashi,
Hisayuki Hara
Abstract:
Let $V$ be a finite set of indices, and let $B_i$, $i=1,\ldots,m$, be subsets of $V$ such that $V=\bigcup_{i=1}^{m}B_i$. Let $X_i$, $i\in V$, be independent random variables, and let $X_{B_i}=(X_j)_{j\in B_i}$. In this paper, we propose a recursive computation method to calculate the conditional expectation $E\bigl[\prod_{i=1}^mχ_i(X_{B_i}) \,|\, N\bigr]$ with $N=\sum_{i\in V}X_i$ given, where…
▽ More
Let $V$ be a finite set of indices, and let $B_i$, $i=1,\ldots,m$, be subsets of $V$ such that $V=\bigcup_{i=1}^{m}B_i$. Let $X_i$, $i\in V$, be independent random variables, and let $X_{B_i}=(X_j)_{j\in B_i}$. In this paper, we propose a recursive computation method to calculate the conditional expectation $E\bigl[\prod_{i=1}^mχ_i(X_{B_i}) \,|\, N\bigr]$ with $N=\sum_{i\in V}X_i$ given, where $χ_i$ is an arbitrary function. Our method is based on the recursive summation/integration technique using the Markov property in statistics. To extract the Markov property, we define an undirected graph whose cliques are $B_j$, and obtain its chordal extension, from which we present the expressions of the recursive formula. This methodology works for a class of distributions including the Poisson distribution (that is, the conditional distribution is the multinomial). This problem is motivated from the evaluation of the multiplicity-adjusted $p$-value of scan statistics in spatial epidemiology. As an illustration of the approach, we present the real data analyses to detect temporal and spatial clustering.
△ Less
Submitted 31 October, 2015;
originally announced November 2015.
-
EM algorithms for estimating the Bernstein copula
Authors:
Xiaoling Dou,
Satoshi Kuriki,
Gwo Dong Lin,
Donald Richards
Abstract:
A method that uses order statistics to construct multivariate distributions with fixed marginals and which utilizes a representation of the Bernstein copula in terms of a finite mixture distribution is proposed. Expectation-maximization (EM) algorithms to estimate the Bernstein copula are proposed, and a local convergence property is proved. Moreover, asymptotic properties of the proposed semipara…
▽ More
A method that uses order statistics to construct multivariate distributions with fixed marginals and which utilizes a representation of the Bernstein copula in terms of a finite mixture distribution is proposed. Expectation-maximization (EM) algorithms to estimate the Bernstein copula are proposed, and a local convergence property is proved. Moreover, asymptotic properties of the proposed semiparametric estimators are provided. Illustrative examples are presented using three real data sets and a 3-dimensional simulated data set. These studies show that the Bernstein copula is able to represent various distributions flexibly and that the proposed EM algorithms work well for such data.
△ Less
Submitted 15 January, 2014; v1 submitted 12 January, 2013;
originally announced January 2013.
-
Abstract tubes associated with perturbed polyhedra with applications to multidimensional normal probability computations
Authors:
Satoshi Kuriki,
Tetsuhisa Miwa,
Anthony J. Hayter
Abstract:
Let $K$ be a closed convex polyhedron defined by a finite number of linear inequalities. In this paper we refine the theory of abstract tubes (Naiman and Wynn, 1997) associated with $K$ when $K$ is perturbed. In particular, we focus on the perturbation that is lexicographic and in an outer direction. An algorithm for constructing the abstract tube by means of linear programming and its implementat…
▽ More
Let $K$ be a closed convex polyhedron defined by a finite number of linear inequalities. In this paper we refine the theory of abstract tubes (Naiman and Wynn, 1997) associated with $K$ when $K$ is perturbed. In particular, we focus on the perturbation that is lexicographic and in an outer direction. An algorithm for constructing the abstract tube by means of linear programming and its implementation are discussed. Using the abstract tube for perturbed $K$ combined with the recursive integration technique proposed by Miwa, Hayter and Kuriki (2003), we show that the multidimensional normal probability for a polyhedral region $K$ can be computed efficiently. In addition, abstract tubes and the distribution functions of studentized range statistics are exhibited as numerical examples.
△ Less
Submitted 12 October, 2011;
originally announced October 2011.
-
Approximate tail probabilities of the maximum of a chi-square field on multi-dimensional lattice points and their applications to detection of loci interactions
Authors:
Satoshi Kuriki,
Yoshiaki Harushima,
Hironori Fujisawa,
Nori Kurata
Abstract:
Define a chi-square random field on a multi-dimensional lattice points index set with a direct-product covariance structure, and consider the distribution of the maximum of this random field. We provide two approximate formulas for the upper tail probability of the distribution based on nonlinear renewal theory and an integral-geometric approach called the volume-of-tube method. This study is moti…
▽ More
Define a chi-square random field on a multi-dimensional lattice points index set with a direct-product covariance structure, and consider the distribution of the maximum of this random field. We provide two approximate formulas for the upper tail probability of the distribution based on nonlinear renewal theory and an integral-geometric approach called the volume-of-tube method. This study is motivated by the detection problem of the interactive loci pairs which play an important role in forming biological species. The joint distribution of scan statistics for detecting the pairs is regarded as the chi-square random field above, and hence the multiplicity-adjusted $p$-value can be calculated by using the proposed approximate formulas. By using these formulas, we examine the data of Mizuta, et al. (2010) who reported a new interactive loci pair of rice inter-subspecies.
△ Less
Submitted 30 March, 2013; v1 submitted 22 December, 2010;
originally announced December 2010.