Search | arXiv e-print repository

arXiv:2405.20023 [pdf, ps, other]

Equality between two general ridge estimators and equivalence of their residual sums of squares

Abstract: General ridge estimators are typical linear estimators in a general linear model. The class of them include some shrinkage estimators in addition to classical linear unbiased estimators such as the ordinary least squares estimator and the weighted least squares estimator. We derive necessary and sufficient conditions under which two general ridge estimators coincide. In particular, two noteworthy… ▽ More General ridge estimators are typical linear estimators in a general linear model. The class of them include some shrinkage estimators in addition to classical linear unbiased estimators such as the ordinary least squares estimator and the weighted least squares estimator. We derive necessary and sufficient conditions under which two general ridge estimators coincide. In particular, two noteworthy conditions are added to those from previous studies. The first condition is given as a seemingly column space relationship to the covariance matrix of the error term, and the second one is based on the biases of general ridge estimators. Another problem studied in this paper is to derive an equivalence condition such that equality between two residual sums of squares holds when general ridge estimators are considered. △ Less

Submitted 10 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 19 pages

MSC Class: 62J05; 62F10; 62J07

arXiv:2404.00888 [pdf, ps, other]

Two step estimations via the Dantzig selector for models of stochastic processes with high-dimensional parameters

Authors: Kou Fujimori, Koji Tsukuda

Abstract: We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for a nuisance parameter is obtained, it is possible to construct an asymptotically normal estimator for the parameter of interest under appropriate conditions. Mot… ▽ More We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for a nuisance parameter is obtained, it is possible to construct an asymptotically normal estimator for the parameter of interest under appropriate conditions. Motivated by this fact, we establish the asymptotic behavior of the Dantzig selector for models of ergodic stochastic processes with high-dimensional parameters of interest and possibly infinite-dimensional nuisance parameters. Applications to ergodic time series models including integer-valued autoregressive models and ergodic diffusion processes are presented. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 42 pages

arXiv:2402.11219 [pdf, ps, other]

Estimators for multivariate allometric regression model

Authors: Koji Tsukuda, Shun Matsuura

Abstract: In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the… ▽ More In a regression model with multiple response variables and multiple explanatory variables, if the difference of the mean vectors of the response variables for different values of explanatory variables is always in the direction of the first principal eigenvector of the covariance matrix of the response variables, then it is called a multivariate allometric regression model. This paper studies the estimation of the first principal eigenvector in the multivariate allometric regression model. A class of estimators that includes conventional estimators is proposed based on weighted sum-of-squares matrices of regression sum-of-squares matrix and residual sum-of-squares matrix. We establish an upper bound of the mean squared error of the estimators contained in this class, and the weight value minimizing the upper bound is derived. Sufficient conditions for the consistency of the estimators are discussed in weak identifiability regimes under which the difference of the largest and second largest eigenvalues of the covariance matrix decays asymptotically and in ``large $p$, large $n$" regimes, where $p$ is the number of response variables and $n$ is the sample size. Several numerical results are also presented. △ Less

Submitted 26 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 20 pages

MSC Class: 62J05; 62H12; 62H25

arXiv:2309.06264 [pdf, ps, other]

Spectral clustering algorithm for the allometric extension model

Authors: Kohei Kawamoto, Yuichi Goto, Koji Tsukuda

Abstract: The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the a… ▽ More The spectral clustering algorithm is often used as a binary clustering method for unclassified data by applying the principal component analysis. To study theoretical properties of the algorithm, the assumption of conditional homoscedasticity is often supposed in existing studies. However, this assumption is restrictive and often unrealistic in practice. Therefore, in this paper, we consider the allometric extension model, that is, the directions of the first eigenvectors of two covariance matrices and the direction of the difference of two mean vectors coincide, and we provide a non-asymptotic bound of the error probability of the spectral clustering algorithm for the allometric extension model. As a byproduct of the result, we obtain the consistency of the clustering method in high-dimensional settings. △ Less

Submitted 6 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 20 pages

MSC Class: 62H25; 62H30

arXiv:2008.12472 [pdf, ps, other]

Evaluating moments of length of Pitman partition

Authors: Koji Tsukuda

Abstract: The Pitman sampling formula has been intensively studied as a distribution of random partitions. One of the objects of interest is the length $K (= K_{n,θ,α})$ of a random partition that follows the Pitman sampling formula, where $n\in\mathbb{N}$, $α\in(0,\infty)$ and $θ> -α$ are parameters. This paper presents asymptotic evaluations of its $r$-th moment $\mathsf{E}[K^r]$ ($r=1,2,\ldots$) under tw… ▽ More The Pitman sampling formula has been intensively studied as a distribution of random partitions. One of the objects of interest is the length $K (= K_{n,θ,α})$ of a random partition that follows the Pitman sampling formula, where $n\in\mathbb{N}$, $α\in(0,\infty)$ and $θ> -α$ are parameters. This paper presents asymptotic evaluations of its $r$-th moment $\mathsf{E}[K^r]$ ($r=1,2,\ldots$) under two asymptotic regimes. In particular, the goals of this study are to provide a finer approximate evaluation of $\mathsf{E}[K^r]$ as $n\to\infty$ than has previously been developed and to provide an approximate evaluation of $\mathsf{E}[K^r]$ as the parameters $n$ and $θ$ simultaneously tend to infinity with $θ/n \to 0$. The results presented in this paper will provide a more accurate understanding of the asymptotic behavior of $K$. △ Less

Submitted 30 March, 2022; v1 submitted 28 August, 2020; originally announced August 2020.

Comments: 10 pages. Added a reference and corrected errors

MSC Class: 60C05; 60F05; 62E20

arXiv:2005.01316 [pdf, ps, other]

doi 10.1016/j.jmva.2021.104822

Limit theorem associated with Wishart matrices with application to hypothesis testing for common principal components

Authors: Koji Tsukuda, Shun Matsuura

Abstract: This study derives a new property of the Wishart distribution when the degree-of-freedom and the size of the matrix parameter of the distribution grow simultaneoulsy. Particularly, the asymptotic normality of the product of four independent Wishart matrices is shown under a high dimensional asymptotic regime. As an application of the result, a statistical test procedure for the common principal co… ▽ More This study derives a new property of the Wishart distribution when the degree-of-freedom and the size of the matrix parameter of the distribution grow simultaneoulsy. Particularly, the asymptotic normality of the product of four independent Wishart matrices is shown under a high dimensional asymptotic regime. As an application of the result, a statistical test procedure for the common principal components hypothesis is proposed. For this problem, the proposed test statistic is asymptotically normal under the null hypothesis. In addition, the proposed test statistic diverges to positive infinity in probability under the alternative hypothesis. △ Less

Submitted 18 July, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: 27 pages. Corrected some errors. Improved presentations

MSC Class: 60F05; 62F05 (Primary) 62H25(Secondary)

Journal ref: Journal of Multivariate Analysis 186 (2021) 104822

arXiv:1904.01729 [pdf, other]

doi 10.1007/978-981-15-9663-6_3

Error bounds for the normal approximation to the length of a Ewens partition

Authors: Koji Tsukuda

Abstract: Let $K(=K_{n,θ})$ be a positive integer-valued random variable whose distribution is given by ${\rm P}(K = x) = \bar{s}(n,x) θ^x/(θ)_n$ $(x=1,\ldots,n) $, where $θ$ is a positive number, $n$ is a positive integer, $(θ)_n=θ(θ+1)\cdots(θ+n-1)$ and $\bar{s}(n,x)$ is the coefficient of $θ^x$ in $(θ)_n$ for $x=1,\ldots,n$. This formula describes the distribution of the length of a Ewens partition, whic… ▽ More Let $K(=K_{n,θ})$ be a positive integer-valued random variable whose distribution is given by ${\rm P}(K = x) = \bar{s}(n,x) θ^x/(θ)_n$ $(x=1,\ldots,n) $, where $θ$ is a positive number, $n$ is a positive integer, $(θ)_n=θ(θ+1)\cdots(θ+n-1)$ and $\bar{s}(n,x)$ is the coefficient of $θ^x$ in $(θ)_n$ for $x=1,\ldots,n$. This formula describes the distribution of the length of a Ewens partition, which is a standard model of random partitions. As $n$ tends to infinity, $K$ asymptotically follows a normal distribution. Moreover, as $n$ and $θ$ simultaneously tend to infinity, if $n^2/θ\to\infty$, $K$ also asymptotically follows a normal distribution. In this paper, error bounds for the normal approximation are provided. The result shows that the decay rate of the error changes due to asymptotic regimes. △ Less

Submitted 1 May, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: 18 pages

arXiv:1808.05925 [pdf, other]

doi 10.1214/20-EJS1761

Weak convergences of marked empirical processes in a Hilbert space and their applications

Authors: Koji Tsukuda, Yoichi Nishiyama

Abstract: In this paper, weak convergences of marked empirical processes in $L^2(\mathbb{R},ν)$ and their applications to statistical goodness-of-fit tests are provided, where $L^2(\mathbb{R},ν)$ is the set of equivalence classes of the square integrable functions on $\mathbb{R}$ with respect to a finite Borel measure $ν$. The results obtained in our framework of weak convergences are, in the topological se… ▽ More In this paper, weak convergences of marked empirical processes in $L^2(\mathbb{R},ν)$ and their applications to statistical goodness-of-fit tests are provided, where $L^2(\mathbb{R},ν)$ is the set of equivalence classes of the square integrable functions on $\mathbb{R}$ with respect to a finite Borel measure $ν$. The results obtained in our framework of weak convergences are, in the topological sense, weaker than those in the Skorokhod topology on a space of cádlág functions or the uniform topology on a space of bounded functions, which have been well studied in previous works. However, our results have the following merits: (1) avoiding conditions which do not suit for our purpose; (2) treating a weight function which makes us possible to propose an Anderson--Darling type test statistics for goodness-of-fit tests. Indeed, the applications presented in this paper are novel. △ Less

Submitted 14 January, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

Comments: 22 pages

Journal ref: Electronic Journal of Statistics 14 (2020) 3914-3938

arXiv:1802.00578 [pdf, other]

A reversal phenomenon in estimation based on multiple samples from the Poisson--Dirichlet distribution

Authors: Koji Tsukuda, Shuhei Mano

Abstract: Consider two forms of sampling from a population: (i) drawing $s$ samples of $n$ elements with replacement and (ii) drawing a single sample of $ns$ elements. In this paper, under the setting where the descending order population frequency follows the Poisson--Dirichlet distribution with parameter $θ$, we report that the magnitude relation of the Fisher information, which sample partitions converte… ▽ More Consider two forms of sampling from a population: (i) drawing $s$ samples of $n$ elements with replacement and (ii) drawing a single sample of $ns$ elements. In this paper, under the setting where the descending order population frequency follows the Poisson--Dirichlet distribution with parameter $θ$, we report that the magnitude relation of the Fisher information, which sample partitions converted from samples (i) and (ii) possess, can change depending on the parameters, $n$, $s$, and $θ$. Roughly speaking, if $θ$ is small relative to $n$ and $s$, the Fisher information of (i) is larger than that of (ii); on the contrary, if $θ$ is large relative to $n$ and $s$, the Fisher information of (ii) is larger than that of (i). The result represents one aspect of random distributions. △ Less

Submitted 2 February, 2018; originally announced February 2018.

Comments: 20 pages

arXiv:1705.09439 [pdf, ps, other]

Taste or Addiction?: Using Play Logs to Infer Song Selection Motivation

Authors: Kosetsu Tsukuda, Masataka Goto

Abstract: Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to… ▽ More Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to analyze people's music listening behavior. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling music listening behavior by taking into account the influence of addiction to artists. Second, by using real-world datasets of play logs, we showed the effectiveness of our proposed model. Third, we carried out qualitative experiments and showed that taking addiction into account enables us to analyze music listening behavior from a new viewpoint in terms of how people listen to music according to the time of day, how an artist's songs are listened to by people, etc. We also discuss the possibility of applying the analysis results to applications such as artist similarity computation and song recommendation. △ Less

Submitted 26 May, 2017; originally announced May 2017.

Comments: Accepted by The 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2017)

arXiv:1705.02761 [pdf, other]

doi 10.1007/s00362-017-0975-8

Covariance structure associated with an equality between two general ridge estimators

Authors: Koji Tsukuda, Hiroshi Kurata

Abstract: In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, o… ▽ More In a general linear model, this paper derives a necessary and sufficient condition under which two general ridge estimators coincide with each other. The condition is given as a structure of the dispersion matrix of the error term. Since the class of estimators considered here contains linear unbiased estimators such as the ordinary least squares estimator and the best linear unbiased estimator, our result can be viewed as a generalization of the well-known theorems on the equality between these two estimators, which have been fully studied in the literature. Two related problems are also considered: equality between two residual sums of squares, and classification of dispersion matrices by a perturbation approach. △ Less

Submitted 20 December, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: 16 pages. This is a pre-print of an article published in Statistical Papers. The final authenticated version is available online at: https://doi.org/10.1007/s00362-017-0975-8

Journal ref: Statistical Papers 61 (2020) 1069-1084

arXiv:1704.06768 [pdf, other]

doi 10.1214/18-AAP1433

On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

Authors: Koji Tsukuda

Abstract: The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $θ$ which denotes the scaled mutation rate, tends to infini… ▽ More The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $θ$ which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that $θ$ grows with $n$ has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when $θ$ grows with $n$, we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula from the viewpoint of Poisson approximations. △ Less

Submitted 22 April, 2017; originally announced April 2017.

Comments: 38 pages

Journal ref: The Annals of Applied Probability 29 (2019) 1188-1232

Showing 1–12 of 12 results for author: Tsukuda, K