-
Equality between two general ridge estimators and equivalence of their residual sums of squares
Authors:
Hirai Mukasa,
Koji Tsukuda
Abstract:
General ridge estimators are typical linear estimators in a general linear model. The class of them include some shrinkage estimators in addition to classical linear unbiased estimators such as the ordinary least squares estimator and the weighted least squares estimator. We derive necessary and sufficient conditions under which two general ridge estimators coincide. In particular, two noteworthy…
▽ More
General ridge estimators are typical linear estimators in a general linear model. The class of them include some shrinkage estimators in addition to classical linear unbiased estimators such as the ordinary least squares estimator and the weighted least squares estimator. We derive necessary and sufficient conditions under which two general ridge estimators coincide. In particular, two noteworthy conditions are added to those from previous studies. The first condition is given as a seemingly column space relationship to the covariance matrix of the error term, and the second one is based on the biases of general ridge estimators. Another problem studied in this paper is to derive an equivalence condition such that equality between two residual sums of squares holds when general ridge estimators are considered.
△ Less
Submitted 10 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Two step estimations via the Dantzig selector for models of stochastic processes with high-dimensional parameters
Authors:
Kou Fujimori,
Koji Tsukuda
Abstract:
We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for a nuisance parameter is obtained, it is possible to construct an asymptotically normal estimator for the parameter of interest under appropriate conditions. Mot…
▽ More
We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for a nuisance parameter is obtained, it is possible to construct an asymptotically normal estimator for the parameter of interest under appropriate conditions. Motivated by this fact, we establish the asymptotic behavior of the Dantzig selector for models of ergodic stochastic processes with high-dimensional parameters of interest and possibly infinite-dimensional nuisance parameters. Applications to ergodic time series models including integer-valued autoregressive models and ergodic diffusion processes are presented.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Estimators for multivariate allometric regression model
Authors:
Koji Tsukuda,
Shun Matsuura
Abstract:
In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the…
▽ More
In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the estimation of the first principal eigenvector in the multivariate allometric regression model. A class of estimators that includes conventional estimators is proposed based on weighted sum-of-squares matrices of regression sum-of-squares matrix and residual sum-of-squares matrix. We establish an upper bound of the mean squared error of the estimators contained in this class, and the weight value minimizing the upper bound is derived. Sufficient conditions for the consistency of the estimators are discussed in weak identifiability regimes under which the difference of the largest and second largest eigenvalues of the covariance matrix decays asymptotically and in ``large $p$, large $n$" regimes, where $p$ is the number of response variables and $n$ is the sample size. Several numerical results are also presented.
△ Less
Submitted 26 May, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Spectral clustering algorithm for the allometric extension model
Authors:
Kohei Kawamoto,
Yuichi Goto,
Koji Tsukuda
Abstract:
The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the a…
▽ More
The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the allometric extension model, that is, the directions of the first eigenvectors of two covariance matrices and the direction of the difference of two mean vectors coincide, and we provide a non-asymptotic bound of the error probability of the spectral clustering algorithm for the allometric extension model. As a byproduct of the result, we obtain the consistency of the clustering method in high-dimensional settings.
△ Less
Submitted 6 October, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Evaluating moments of length of Pitman partition
Authors:
Koji Tsukuda
Abstract:
The Pitman sampling formula has been intensively studied as a distribution of random partitions. One of the objects of interest is the length $K (= K_{n,θ,α})$ of a random partition that follows the Pitman sampling formula, where $n\in\mathbb{N}$, $α\in(0,\infty)$ and $θ> -α$ are parameters. This paper presents asymptotic evaluations of its $r$-th moment $\mathsf{E}[K^r]$ ($r=1,2,\ldots$) under tw…
▽ More
The Pitman sampling formula has been intensively studied as a distribution of random partitions. One of the objects of interest is the length $K (= K_{n,θ,α})$ of a random partition that follows the Pitman sampling formula, where $n\in\mathbb{N}$, $α\in(0,\infty)$ and $θ> -α$ are parameters. This paper presents asymptotic evaluations of its $r$-th moment $\mathsf{E}[K^r]$ ($r=1,2,\ldots$) under two asymptotic regimes. In particular, the goals of this study are to provide a finer approximate evaluation of $\mathsf{E}[K^r]$ as $n\to\infty$ than has previously been developed and to provide an approximate evaluation of $\mathsf{E}[K^r]$ as the parameters $n$ and $θ$ simultaneously tend to infinity with $θ/n \to 0$. The results presented in this paper will provide a more accurate understanding of the asymptotic behavior of $K$.
△ Less
Submitted 30 March, 2022; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Limit theorem associated with Wishart matrices with application to hypothesis testing for common principal components
Authors:
Koji Tsukuda,
Shun Matsuura
Abstract:
This study derives a new property of the Wishart distribution when the degree-of-freedom and the size of the matrix parameter of the distribution grow simultaneoulsy. Particularly, the asymptotic normality of the product of four independent Wishart matrices is shown under a high dimensional asymptotic regime. As an application of the result, a statistical test procedure for the common principal co…
▽ More
This study derives a new property of the Wishart distribution when the degree-of-freedom and the size of the matrix parameter of the distribution grow simultaneoulsy. Particularly, the asymptotic normality of the product of four independent Wishart matrices is shown under a high dimensional asymptotic regime. As an application of the result, a statistical test procedure for the common principal components hypothesis is proposed. For this problem, the proposed test statistic is asymptotically normal under the null hypothesis. In addition, the proposed test statistic diverges to positive infinity in probability under the alternative hypothesis.
△ Less
Submitted 18 July, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
Error bounds for the normal approximation to the length of a Ewens partition
Authors:
Koji Tsukuda
Abstract:
Let $K(=K_{n,θ})$ be a positive integer-valued random variable whose distribution is given by ${\rm P}(K = x) = \bar{s}(n,x) θ^x/(θ)_n$ $(x=1,\ldots,n) $, where $θ$ is a positive number, $n$ is a positive integer, $(θ)_n=θ(θ+1)\cdots(θ+n-1)$ and $\bar{s}(n,x)$ is the coefficient of $θ^x$ in $(θ)_n$ for $x=1,\ldots,n$. This formula describes the distribution of the length of a Ewens partition, whic…
▽ More
Let $K(=K_{n,θ})$ be a positive integer-valued random variable whose distribution is given by ${\rm P}(K = x) = \bar{s}(n,x) θ^x/(θ)_n$ $(x=1,\ldots,n) $, where $θ$ is a positive number, $n$ is a positive integer, $(θ)_n=θ(θ+1)\cdots(θ+n-1)$ and $\bar{s}(n,x)$ is the coefficient of $θ^x$ in $(θ)_n$ for $x=1,\ldots,n$. This formula describes the distribution of the length of a Ewens partition, which is a standard model of random partitions. As $n$ tends to infinity, $K$ asymptotically follows a normal distribution. Moreover, as $n$ and $θ$ simultaneously tend to infinity, if $n^2/θ\to\infty$, $K$ also asymptotically follows a normal distribution. In this paper, error bounds for the normal approximation are provided. The result shows that the decay rate of the error changes due to asymptotic regimes.
△ Less
Submitted 1 May, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
Weak convergences of marked empirical processes in a Hilbert space and their applications
Authors:
Koji Tsukuda,
Yoichi Nishiyama
Abstract:
In this paper, weak convergences of marked empirical processes in $L^2(\mathbb{R},ν)$ and their applications to statistical goodness-of-fit tests are provided, where $L^2(\mathbb{R},ν)$ is the set of equivalence classes of the square integrable functions on $\mathbb{R}$ with respect to a finite Borel measure $ν$. The results obtained in our framework of weak convergences are, in the topological se…
▽ More
In this paper, weak convergences of marked empirical processes in $L^2(\mathbb{R},ν)$ and their applications to statistical goodness-of-fit tests are provided, where $L^2(\mathbb{R},ν)$ is the set of equivalence classes of the square integrable functions on $\mathbb{R}$ with respect to a finite Borel measure $ν$. The results obtained in our framework of weak convergences are, in the topological sense, weaker than those in the Skorokhod topology on a space of cádlág functions or the uniform topology on a space of bounded functions, which have been well studied in previous works. However, our results have the following merits: (1) avoiding conditions which do not suit for our purpose; (2) treating a weight function which makes us possible to propose an Anderson--Darling type test statistics for goodness-of-fit tests. Indeed, the applications presented in this paper are novel.
△ Less
Submitted 14 January, 2019; v1 submitted 17 August, 2018;
originally announced August 2018.
-
A reversal phenomenon in estimation based on multiple samples from the Poisson--Dirichlet distribution
Authors:
Koji Tsukuda,
Shuhei Mano
Abstract:
Consider two forms of sampling from a population: (i) drawing $s$ samples of $n$ elements with replacement and (ii) drawing a single sample of $ns$ elements. In this paper, under the setting where the descending order population frequency follows the Poisson--Dirichlet distribution with parameter $θ$, we report that the magnitude relation of the Fisher information, which sample partitions converte…
▽ More
Consider two forms of sampling from a population: (i) drawing $s$ samples of $n$ elements with replacement and (ii) drawing a single sample of $ns$ elements. In this paper, under the setting where the descending order population frequency follows the Poisson--Dirichlet distribution with parameter $θ$, we report that the magnitude relation of the Fisher information, which sample partitions converted from samples (i) and (ii) possess, can change depending on the parameters, $n$, $s$, and $θ$. Roughly speaking, if $θ$ is small relative to $n$ and $s$, the Fisher information of (i) is larger than that of (ii); on the contrary, if $θ$ is large relative to $n$ and $s$, the Fisher information of (ii) is larger than that of (i). The result represents one aspect of random distributions.
△ Less
Submitted 2 February, 2018;
originally announced February 2018.
-
Taste or Addiction?: Using Play Logs to Infer Song Selection Motivation
Authors:
Kosetsu Tsukuda,
Masataka Goto
Abstract:
Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to…
▽ More
Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to analyze people's music listening behavior. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling music listening behavior by taking into account the influence of addiction to artists. Second, by using real-world datasets of play logs, we showed the effectiveness of our proposed model. Third, we carried out qualitative experiments and showed that taking addiction into account enables us to analyze music listening behavior from a new viewpoint in terms of how people listen to music according to the time of day, how an artist's songs are listened to by people, etc. We also discuss the possibility of applying the analysis results to applications such as artist similarity computation and song recommendation.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
Covariance structure associated with an equality between two general ridge estimators
Authors:
Koji Tsukuda,
Hiroshi Kurata
Abstract:
In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, o…
▽ More
In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach.
△ Less
Submitted 20 December, 2017; v1 submitted 8 May, 2017;
originally announced May 2017.
-
On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size
Authors:
Koji Tsukuda
Abstract:
The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $θ$ which denotes the scaled mutation rate, tends to infini…
▽ More
The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $θ$ which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that $θ$ grows with $n$ has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when $θ$ grows with $n$, we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula from the viewpoint of Poisson approximations.
△ Less
Submitted 22 April, 2017;
originally announced April 2017.