The decomposite -test when the dimension is large
Chia-Hsuan Tsai and Ming-Tien Tsai
Institute of Statistical Science, Academia Sinica, Taipei.
Abstract: In this paper, we discuss tests for mean vector of high-dimensional data when the dimension is a function of sample size . One of the tests, called the decomposite -test, in the high-dimensional testing problem is constructed based on the estimation work of Ledoit and Wolf (2018), which is an optimal orthogonally equivariant estimator of the inverse of population covariance matrix under Stein loss function. The asymptotic distribution function of the test statistic is investigated under a sequence of local alternatives. The asymptotic relative efficiency is used to see whether a test is optimal and to perform the power comparisons of tests. An application of the decomposite -test is in testing significance for the effect of monthly unlimited transport policy on public transportation, in which the data are taken from Taipei Metro System. Key words : Asymptotically local power function, asymptotic relative efficiency, decomposite -test, high-dimensional covariance matrix, orthogonally equivariant estimator, Stieltjes transform.
1 Introduction
Let , be random vectors having a -dimensional multinormal distribution with mean vector and unknown positive definite covariance matrix . In this paper, we are interested in testing the hypothesis
(1.1) |
when both dimension and sample size are large. Let
(1.2) |
The Hotelling’s -test statistic is given by
(1.3) |
When the dimension is fixed, the well-known Hotelling’s -test enjoys many optimal properties (Anderson [1]). However, when the dimension becomes large, the sample covariance matrix may not be a consistent estimator of population covariance matrix when . Such situation makes it a hard work to estimate the precision matrix and to make further usage of it. Dempster [8] [9], Bai and Saranadasa [2] first observed this phenomenon, proposed a non-exact test for the hypothesis testing problem (1.1) with the dimension larger than the sample size. Three decades later, Bai and Saranadasa [2] proposed a new test, which ignored the information of by taking identity matrix for replacement to simplify the problem. The result showed that their test has the same asymptotic power as that of the Dempster’s test under some assumptions on the dimension, mean vector and population covariance matrix . Along this line, Chen and Qin [6], further modified the test statistic of Bai and Saranadasa [2]. Srivastava and Du [23] and Srivastava [24] used the partial information of , namely the diagonal elements, to construct new test statistic. Later, Park and Ayyala [21], modified the Srivastava type test by incorporating some information of correlations. Feng et al. [10] assumed that the matrix has a kind of block diagonal structure to construct the composite Hotelling’s type test statistic. On the other hand, Chen et al. [5], used the quantity to replace in (1.3), where . They used the notion of ridge regression which is highly related to the concept of rotation-equivariant property after matrix decomposition. Then by using method of shrinkage estimation they constructed the regularized Hotelling’s test statistic and studied its asymptotic distribution. All the tests mentioned above can be viewed as various versions of regularized Hotelling’s -test. Most of the situations considered are under the setup that both dimension and sample size are large so that . In this paper, we concentrate on the situation that . Different from those approaches existing in the literature, our approach try to reveal more information of correlations in terms of eigenvalues. Stein [26] proposed the orthogonally equivariant estimator of covariance matrix and Ledoit and Wolf [14] proposed another orthogonally equivariant estimator of inverse covariance matrix. Ledoit and Wolf [16] claimed that their estimator is asymptotically optimal in the sense of minimizing the Stein loss. The rest of the paper is organized as follows. The notion of orthogonally equivariant estimators of covariance matrix for large dimensional situation and some simple notations of random matrix theory are introduced in Section 2. The decomposite -test statistic is presented in Section 3. And the asymptotically equivalent statistic along with its asymptotic local power property are also investigated in the same section. Power comparisons based on the asymptotic relative efficiency are discussed in Section 4. A real example is analyzed via the bootstrap test based on the decomposite -test statistic and the Hotelling’s -test statistic, respectively in Section 5. The conclusion is given in Section 6.
2 The orthogonally equivariant estimators
The class of orthogonally equivariant estimators of covariance matrix is constituted of all the estimators having the same eigenvectors as the sample covariance matrix. Consider the sample spectral decomposition, i.e., , where is a diagonal matrix with eigenvalues , and is the corresponding orthogonal matrix with being the corresponding eigenvector with respective to . Similarly, for the spectral decomposition of population covariance matrix, we have , where is a diagonal matrix with eigenvalues , and is the corresponding orthogonal matrix. With respect to the Stein loss function, Stein [26], [27] considered the orthogonally equivariant nonlinear shrinkage estimator which of the form
(2.1) | ||||
However, some of the might be negative and non-monotone.
To mitigate the problems, Stein recommended to use an isotonizing algorithm procedure to
adjust his estimators in (2.1). Stein’s isotonized estimator has been considered as a gold standard,
thereafter a large strand of literature on orthogonally equivariant estimation of
population covariance matrix was generated.
The same as Ledoit and Péché [12], we make the following assumptions:
A1. Let , where is a
vector of independent and identically distributed random variables .
Each has mean , unit variance and 12th absolute central moment
bounded by a constant.
A2. For large setup, the large dimensional asymptotic framework is setup when
such that is fixed, in this paper.
A3. The population covariance matrix is positive definite matrix.
Furthermore, , where
is the norm of a matrix.
A4. Let .
The empirical spectral distribution of
defined by ,
converges as to a
probability distribution function at every point of continuity of .
The support of , ,
is included in a compact set with .
Let
be the sample spectral distribution and be its limiting distribution. We also assume that
there exists a nonrandom real function defined on the support of and is
continuously differentiable on the support.
The Stieltjes transform of distribution function is defined by
(2.2) |
where is the half-plane of complex numbers with a strictly positive imaginary part. The empirical version is
(2.3) |
Choi and Silverstein [7] showed that
(2.4) |
exists. Subsequently, the well known Marčenko and Pastur equation (Choi and Silverstein [7]) in literature can be expressed in the form
(2.5) |
where denotes the limiting behavior of the population spectral distribution. Upon the Marčenko-Pastur equation, meaningful information of the population spectral distribution can be retrieved under the large dimensional asymptotic framework. Ledoit and Péché [12] extended the results to the more general situations including the case of the precision matrix . In addition to estimate the population covariance matrix , they also estimate the inverse of population covariance matrix . Consider , where is a scale function on the eigenvalues of a matrix such that (Ledoit and Péché [12], page 236). Ledoit and Péché [12] proved that converges to almost surely under the conditions A1-A4, where
(2.6) |
Note that if , then the equation (2.6) reduces to the equation (2.5). Ledoit and Wolf [14] suggested to use the oracle estimators for , with
(2.7) |
Note that is the quantile, i.e., with , , where denotes the largest integer of . Also note that is nonrandom and an estimable quantity, . Let , Ledoit and Péché [12] showed that is approximated by . Slightly different from the Stein’s estimator in (2.1), Ledoit and Wolf [14] proposed the estimator of which is of the form
(2.8) | ||||
where is the estimator of ,
as well as a multivariate quantized eigenvalues sample function. Ledoit and Wolf [14] showed that
a.s. with rescaled Frobenius norm
concluded that
is the consistent estimator of .
Ledoit and Wolf [16] concluded that
with probability one, further asserted the asymptotic optimality
of their shrinkage estimator (2.8) under Stein loss function.
Ledoit and Wolf [16] pointed out that both the estimators in (2.1) and (2.8)
have a similar form in terms of Cauchy principal value. The only difference between (2.1) and
(2.8) is that the former uses the empirical distribution of sample
eigenvalues to estimate the Stieltjes transform of distribution function ,
while the latter one uses a smoothed version
instead. They also commented that Stein’s estimator in (2.1) has theoretical limitations and claimed that their estimator performs better compared to
Stein’s estimator, by the evidence of Monte-Carlo simulations.
3 Main results
3.1 The decomposite test statistic
For the problem (1.1), those tests proposed in the literature basically are made by ignoring or using partial information from the sample covariance matrix. The approach we adopt is to reveal the information of eigenvalues with the help of random matrix theory. The orthogonally equivariant estimators of covariance matrix generally enjoy some optimal properties. The optimal one among the class of orthogonally equivariant estimators is mostly desired. Ledoit and Wolf [14] claimed that is asymptotically optimal estimator of under Stein loss. And hence, for the hypothesis testing problem (1.1) we may consider the test statistic
(3.1) |
We may also note that can be replaced by , which is the inverse of matrix defined in (2.1). Let , take in (2.8) as (a) , (b) , (c) and (d) , , then it (3.1) reduces to the case of (a) Bai and Saranadasa [2], (b) Li et al. [17], (c) the Hotelling’s -test (1.3) and (d) the regularized Hotelling’s test Bai et al. [3] statistics, respectively. First, we may note that based on the results of Theorem 5 of Dempster [9], are approximated by the quantity . Johnstone and Paul [11] proved that a.s. under the rescaled Frobenius norm. Let and , where and with . Since is Wishart distributed, when p is fixed we may note that is the maximum likelihood estimator (MLE) of Muirhead [19]. From the general theory of estimation that the maximum likelihood estimator is consistent, it tends to the true value with probability one as sample size becomes large under some regularity conditions which are satisfied by Wishart density. When the dimension is fixed, we may conclude that converges to with probability one. Note that when is fixed and the sample size is large, reduces to the sample covariance matrix . Then converges to with probability one. Namely, when dimension is fixed while the sample size is large, the decomposite -test reduces to Hotelling’s -test statistic. Nevertheless, this optimal property remains wide open for large situation. To overcome the difficulties, we also restrict the estimator of covariance matrix to the class of orthogonally equivariant estimators by imposing the sample eigenvector on the corresponding population eigenvector in this paper.
3.2 The asymptotically equivalent statistic of
The decomposite -test statistic in (3.1) involves a mixture information of
nonlinear sample eigenvalues that complicates the task of deriving its distribution function.
By virtue of random matrix theory, Pan and Zhou [20] derived the limiting distribution function of
Hotelling’s -test statistic when .
Meanwhile, Chen et al. [5] used Stieltjes transform to derive the asymptotic power distribution,
under , of the regularized Hotelling’s -test statistic, which involves the
linear function of sample eigenvalues. Li et al. [17] extended the result for the one-sample
regularized Hotelling’s -test of Chen et al. [5] to the two-sample problem
under a class of local alternatives.
Both the asymptotic power functions for the one-sample regularized Hotelling’s -test of
Chen et al. (Theorem 1 and Proposition 2) and for the two-sample regularized Hotelling’s -test of
Li et al. [17] are the functions of Stieltjes transform , defined in (2.4),
and its derivative.
Note that , when , includes the real part (Hilbert transform) and the imaginary part , where .
For example, when the empirical density function
of eigenvalues converges weakly in probability to the Marenko-Pastur law
,
where and .
Then, with and in equation (13) of Chen et al. [5] the Stieltjes transform of becomes
(3.2) | ||||
On the other hand, if we know the Stieltjes transform, we can from it deduce the limiting spectral density function
.
Thus both two asymptotic power functions of the regularized Hotelling’s -tests for one-sample problem
by Chen et al. [5] and for two-sample one by Li et al. [17] are complex value functions, which seem to be against statistical
common sense for real test statistics.
Under assumptions A1-A4, Marčenko and Pastur [18] proved that converges to a.s..
It is well known that converges to a.s., . Both Chen et al. [5]
and Li et al. [17] concluded this convergence holds even when and directly used as the consistent estimator of to prove Proposition 1 of Chen et al. [5], when . However, when we may note that
(3.3) | ||||
Hence does not converge to when .
Thus, the Proposition 1 of Chen et al. [5] is not corrected and needs to be further re-investigated.
To overcome the difficulties mentioned above we may instead try to find the
asymptotically equivalent statistic in distribution for called ,
defined in (3.6), which asymptotically local distribution and
asymptotically local power function can be acquired.
We assumed that the data come from the multinormal distribution, thus the sample mean vector
is independent of the sample covariance matrix ,
namely is independent of and .
Under assumptions A1-A4, Ledoit and Wolf [14] showed that
converges to
a.s. as , . And they ([14], Proposition 4.3) further proved that a.s. as
( i.e., a.s. as ),
where is the Frobenius norm defined as
with tr(A) denoting the trace of matrix A.
Since converges to
a.s. as , namely, where .
Thus without loss of generality, for simplicity we may write
.
That is the same to say that
,
as . Note that
as ,
where ,
is also a positive definite matrix under local
alternative (3.10). Decompose as ,
where with
. Let ,
we have the following three situations (i) when , (ii) when
and (iii) is a nonrandom but unknown constant when .
Note that for the case (i), we may have ,
which is against statistical common sense. For the case (iii), when ,
namely, it is the same as the fixed dimensional case. Hence without loss of generality,
we may only consider the case in details.
Note that ,
thus .
As such,
a.s. as .
Hence for the high dimensional case, we may obtain that
in probability as .
This result implies that
in distribution as . Namely,
Furthermore, it is easy to note that under normalization both two statistics and have the same asymptotic distribution function as . Thus we may obtain that
Hence we may have the following conclusion:
(3.4) |
in distribution as . When , from (3.2) we may note that and then from (2.7) we have that , i.e., . Thus, by equation (3.4) we may obtain that in distribution as , where is non-central chi-square distributed with degrees of freedom and non-centrality . Therefore, we have the following theorem.
Theorem 1.
Under assumptions A1-A4, when , then is asymptotically equivalent to in distribution as .
For the hypothesis testing problem (1.1), when
Theorem 1 indicates that the decomposite -test may be
asymptotically optimal when the dimension is large, while the Hotelling’s -test is not.
However, the situation may be different for general . If as the consistent estimator of can be true, we then have
converges to with probability one.
Despite that we adopt the Ledoit and Wolf’s optimal estimator (eq:2.8)
of the population precision matrix, however, may not generally be equal to ,
unless that (i.e., ).
The orthogonal matrix may not generally be a consistent estimator of when the dimension
is large, (see Bai et al. [4] and references therein). Hence we may work it under the restricted model,
namely, under the Wishart distribution setup when .
Note that , the group of orthogonal matrices,
which is a compact group.
Hence, there exists a subsequence such that
converges to a.s. as .
For the case , when we obtain that as ,
and hence we may have that
. Thus by the equation (3.4), when
we have that
in distribution as .
In the general case, for simplicity we may investigate the limiting distribution function of
under the assumption that converges to a.s. in the weak topology as
, .
By equation (3.4) we then have
(3.5) |
in distribution as .
Furthermore, since is one orbit, and hence by the theory of compact group there exist some
with . Therefore
(3.6) |
in distribution as . Note that generally. This will make things for high-dimensional situations different from those for the fixed dimensions. Both Stein [26] and Ledoit and Wolf [14] directly considered the case when . Namely, when converges to a.s. as .
Theorem 2.
Assume that converges to a.s. in the weak topology as , . Then under assumptions A1-A4, is asymptotically equivalent to ( defined in (3.6)) in distribution as .
3.3 The asymptotic distribution of
Let , then . Let , and decompose it as , where . Let , then , where =. And
(3.7) |
which is the mixture of non-central chi-square distributions. By the results of Corollary 1.3.5 of Muirhead [19], after some straightforward algebraic calculations, we have
(3.8) |
and
(3.9) |
Generally, the power of any reasonable test goes to one when the sample size is large (Chen et al. [5],Theorem 1). Thus, it is hard to compare the tests when the sample size goes to infinity. As such, we may use the local power to compare the tests. We extent the concept of local power from the fixed dimensional situation to the large dimensional one. Different from the fixed dimensional one, we incorporate the dimension into the consideration for the large dimensional situation. We study the asymptotic distribution of under the sequence of local alternatives
(3.10) |
where is a fixed -dimensional vector, which means to assume that
when is large.
We may remark that this local alternative is equivalent to the one
considered in Feng et al. [10].
Let , then = ,
where =.
Note that and .
Thus, and
as .
Theorem 3.
Under the assumptions of Theorem 2 and the sequence of local alternatives defined in (3.10), the asymptotic power function of test statistic in (3.1) is
(3.11) |
where denotes the standard normal distribution, and being a positive constant.
Proof.
Under the null hypothesis, we may note that and . Thus by Theorem 2 as we have
(3.12) | ||||
And hence, under the sequence of local alternatives we then have
(3.13) | ||||
∎
Generally, the applications of Theorem 3, it needs the consistent estimators of . When , then , which can be consistently estimated by . Hence we have the following.
Corollary 1.
Under the assumptions of Theorem 2 and under , when , we have that
Thus, when the quantity is completely data-driven.
Let , write =diag.
The weight is the ratio of th eigenvalues of two covariance matrices
and , i.e.,
.
Then we may note that
and hence .
When , (i.e.,),
then =,
and . Hence we have the following.
Corollary 2.
For the hypothesis testing problem (1.1), under the assumptions of Theorem 2, if (i.e., a.s., as ), then the asymptotically local power of is
The statistic asymptotically reduces to non-central chi-square distributed when , (i.e., ).
Corollary 3.
For the hypothesis testing problem (1.1), under the assumptions of Theorem 2, if , then the proposed -test is asymptotically optimal.
Remark 1 For one thing, if , then the asymptotic power of is equal to the significant level . And for another, as the diagonal matrix equals to , i.e., . If (i.e., ) the proposed test has the asymptotically optimal power property. Moreover, we may note that the asymptotic distribution of the optimal test statistic is non-central distributed. As a result, the key point to obtain the asymptotically optimal Hotelling’s type test is to use the consistent estimator of . Note that we assume that , then is the MLE of . And hence a.s. when is fixed and (i.e., ). Thus Hotelling’s -test statistic in (1.3) converges to in probability as . However, it may not be true when due to the inconsistency of sample eigenvalues. By Theorem 3 and Remark 1, we have the following
Corollary 4.
For the hypothesis testing problem (1.1), the Hotelling’s -test is asymptotically optimal when . However, it is not asymptotically optimal when .
Remark 2 Generally , by Corollary 3 the decomposite test statistic , which is based on the optimal orthogonally equivariant estimator for the precision matrix , will not be asymptotically optimal for the hypothesis testing problem (1.1). Namely, all the regularized Hotelling’s type tests are not asymptotically optimal due to the sample eigenvalues inconsistency. As such, to obtain the asymptotically optimal test for the hypothesis testing problem (1.1) without having the structure assumption of covariance, it is necessary to do more modification work with the eigenvalue and eigenvector estimation of population covariance matrix . Namely, to find out the consistent estimator of when is a quite hard work. It remains wide open in the literature. Remark 3 Usually, for the fixed dimensional cases there is no any restriction on the unknown nuisance parameter to establish the asymptotic normality of the test statistics. However, for the large dimensional p cases, the asymptotic normality of the test statistics holds either under some restrictions on the unknown nuisance parameter or the case that proposed test is optimal when . As such, the numerical powers of tests under large dimension situation are not comparable. Because we can only perform those numerical power functions under restricted parameter spaces of , where the asymptotic normality of test statistics holds. Those restriction spaces of over spaces and Feng et al. [10], are generally hard to be analytically characterized. Each test may have different restricted parameter space to ensure the asymptotic normality of test statistic. Besides, there is no clear way to compare the power functions for those tests beyond restricted spaces. To overcome the difficulty, we provide a testing procedure under the local alternative which the dimensionality p is also taken into the consideration. This generalize the fixed dimensional situations into the large dimensional cases. Our proposed test statistics dose not encounter such a disaster mentioned above, as we have discussed in Corollary 3, the optimal convergence estimator of will lead the corresponding test to be optimal. Thus, to compare the tests for hypothesis testing problem (1.1), it is essential to compare the estimators of . Random matrix theory will play an important role in obtaining reasonable estimators of population covariance matrix. We will explain this point more clearly through comparisons with the existing tests in Section 4.
4 The comparison of tests
4.1 The asymptotic relative efficiency
A standard method to compare asymptotic power functions is through asymptotic relative efficiency (ARE) (Pitman [22]), which is essentially defined via large deviation asymptotics. It is well known that the Sanov theorem and its generalizations reduce the problem of large deviations to a minimization problem of Kullback-Leibler divergence on the corresponding set of distributions. For any two test statistics which are asymptotic to normal, i.e., distributed with noncentralities and , respectively. Then the ARE of these two tests is equivalent to . Whenever the value of ARE of test relative to test is larger than one, then the procedure based on is considered to have larger asymptotic power than that of the competing test based on . The test has the better asymptotic power than that of test if the eigenmatrix of is larger than . Following the arguments as in Case 1, we can easily see that the tests proposed by Dempster [8], [9], Bai and Saranadasa [2], Srivastava and Du [23], Srivastava [24], Chen and Qin [6], Chen et al. [5], Park and Ayyala [21] and Feng et al. [10] are not optimal for the hypothesis testing problem (1.1) when the dimension is large. Basically, these results can be classified into the following three categories: Case 1. Compare the ARE of tests constructed without using the information of correlations. Let Thus the eigenmatrix of is larger than . Thus we may conclude that the tests proposed by Dempster [8],[9], Bai and Saranadasa [2] are not optimal. Similar arguments by taking , where with , we may also conclude that tests used the information of diagonal elements of , such as Srivastava and Du [23], Srivastava [24], Chen and Qin [6], Park and Ayyala [21] are not optimal neither. Case 2. Compare the tests constructed by using some correlations for the estimation of covariance matrix. Feng et al. [10] followed Bai and Saranadasa’s model assumptions and improved the works of Chen and Qin [6], Park and Ayyala [21] by adding correlations into consideration. They divided the variables into several small parts for invertible covariance matrix and then added those corresponding Hotelling -test statistics up, which is called the composite test. The asymptotic power function of the composite test is of the form
(4.1) |
where and , for the details see Feng et al. [10] (p.1423). To avoid the asymptotic power always being one as , some further conditions are needed. Note that under their assumption (C3): , then equation (4.1) can be further reduced to that . We may see that the asymptotic power function of composite test becomes if holds. But, note that will not be equal to generally. Feng et al. [10] basically made some assumptions on the covariance matrix so that the estimator of covariance matrix having the block diagonal type matrix, thus we may concern that the information may be lost in general. Theorem 3 tells us that the composite test of Feng et al. [10] is not optimal unless that , i.e., , which will not happen in their setup. Again, as in Case 1, we may conclude that there still exists room to develop test of more robust and powerful. Case 3. Compare the tests constructed by adopting the ridge regression type covariance estimator. Chen et al. [5] imposed some regularizations on the sample covariance matrix and proposed a regularized Hotelling’s statistic (RHT)
(4.2) |
where . Note that the RHT statistic , which has the similar form as that of the decomposite -test statistic. Note that is linear and needs to be estimated. This is related to the Stein type shrinkage estimators. Their estimators of population eigenvalues may not be optimal. Ledoit and Wolf [13] studied the best linear estimator of the form . Ledoit and Wolf [14] further claimed that the nonlinear estimators are better than those of the best linear estimators . It remains room to improve the estimators of eigenvalues. Ledoit and Péché [12] used the random matrix theory to claim that their nonlinear shrinkage eigenvalues estimator of the precision matrix is optimal. As noted in above, the ARE is based on the quantity of Kullback-Leibler divergence, and the Stein loss function is proportional to the Kullback-Leibler divergence under the multivariate normal setup. As such, the optimal orthogonally equivariance estimator corresponds to the optimal power test. Among the class of orthogonally equivariant estimators, the decomposite test statistic digs out the optimal information of eigenvalues of the precision matrix. Ledoit and Wolf [16] expected that their estimator in (2.8) to be close to the inverse population matrix (precision matrix), and at the same time its inverse can also be close to the population covariance matrix. In comparisons with the tests mentioned above, the decomposite -test is different from them. We may expect that the decomposite -test may perform better than both the RHT proposed by Chen et al. [5] and the composite test proposed by Feng et al. [10]. It is easy to note that the sample eigenvalues are not independent. One of our main goals is to fulfill the hope that more information of population eigenvalues can be digged out via the help of dedicated random matrix theory.
4.2 Numerical power comparisons
Via Corollary 2, it is easy to see that the composite -test of Feng et al. [10] has a similar form of asymptotically local power function as that of the proposed decomposite -test. Define the quantity as the ARE of the decomposite -test with respect to the composite -test. Note that, if the value of ARE is larger than 1, then the decomposite -test has greater power than that of the composite -test. We make some simulation studies of power comparisons and AREs for the decomposite -test and the composite -test based on the intraclass correlation model. Namely, , where ; . Without loss of generality, we take , the significance level and with . When , take , while , take in Table 1.
The decomposite | The composite | ARE | The decomposite | The composite | ARE | |
-test | -test | |||||
-0.2 | ||||||
0.2 | ||||||
-0.5 | ||||||
0.5 | ||||||
-0.8 | ||||||
0.8 | ||||||
5 Real data analysis
More than two decades have passed since the founding of the Taipei Rapid Transit Corporation (TRTC) in 1994.
Entering the 2.0 era, the Metro system is complete and is time for further expansion.
A multi-point transferring model relieves congestion and disperses the current burden of existing transfer stations,
therefore, providing the public with speedier and better transportation performance and quality.
In order to test whether there is a significant growth in population of public transportation,
especially commuters mainly take Taipei Metro System in recently years, we use data gathered from 1 July, 2015 to 30 April,
2020, including 108 stations’ exit ridership on record. Since lacking the acknowledgment of distribution of ,
we use bootstrap method to conduct the one sample testing problem with significant level .
5.1 The Bootstrap procedures for calculating are as follow:
-
1.
Calculate column mean vector and sample covariance matrix of data set before resampling. Decomposite into sample eigenvalues and its corresponding eigenvectors .
-
2.
Calculate provided by Ledoit and Wolf [14] by using their algorithm of numerical implementation, the QuEST function in Ledoit and Wolf [15].
-
3.
Then can be acquired as .
-
4.
Repeated random sample of the days from original data set with replacement, record the subset data each time.
-
5.
Calculate sample covariance matrix and the corresponding for each collect data set.
After building up a sampling distribution by computing from 1000 times simulated data under the null hypothesis, we compare the test statistic before resampling to the sampling distribution. The empirical p-value is the proportion in the sampling distribution that are as extreme as the test statistics.
We want to test whether there is a difference in mean ridership among stations under the following two cases. Let be the exit ridership mean vector of 108 stations, which is calculated from the second half year from July to December of 2015 as a comparison bench mark for mean testing. And our parameters to test, the exit ridership mean vector of 108 stations is denoted as .
5.2 The effect of Monthly Unlimited Transport Policy
In 2018, Mayors of Taipei and New Taipei City announced a new unlimited public transportation card, called the ”All Pass Ticket”, and is priced at NT1,280 a month. It is released on April 16, 2018, and it is a periodical commuter ticket. It is valid for both buses and the Taipei Metro, and also for the first 30 minutes of a YouBike ride. Commuters across Taipei and New Taipei City are sure to benefit from the policy. Paying NT1,280 for 30 days unlimited rides works out to an average cost of NT42 per day. Taipei Mayor Ke Wen-je said that as always, people are encouraged to use public transportation to help combat traffic congestion. On the other hand, New Taipei City Mayor Eric Chu said he hoped the new pass can help boost daily ridership in Taipei’s public transportation system (March 12, 2018. Central News Agency). Hypothesis testing problem of interest is:
(5.1) |
where is the mean vector of stations during the period of policy,
and is the mean vector as defined before.
Here we check the effectiveness of the “All Pass Ticket” policy by bootstrap resampling process
based on days from 2017 till 2019 to calculate the test statistic , and the
Hotelling’s -test statistic
shows that there are
shuffled statistics out of 1000 less than the value, which is , of -test statistic for
the real data set. No matter what the significance level is either 0.01 or 0.05, the empirical p-value
is equal to which is less than . The value is also less than quantile of
the sampling distribution of value . Meanwhile, the result of using Hotelling’s -test while with empirical p-value being equal to . It seems that there is a significant difference of the mean values in the aspect of exit ridership of each station during the monthly unlimited public transport card policy.
For this real data set, by both the decomposite -test with empirical p-value 0.04 and Hotelling’s -test with empirical p-value 0, H in (5.1) is rejected when the level of significance is either 0.01 or 0.05. Note that no matter how small the significance level is, H is strongly rejected by Hotelling’s T-test, with empirical p-value 0. This indicates that the decomposite -test has the advantage over Hotelling’s T-test by the bootstrap procedure in the analysis of this real data set.
6 Conclusion and Future Study
It is generally hard to compare tests well based on a single index,
for there are so many nuisance parameters when the dimension is large.
Some other statistical aspects are also needed to be incorporated into consideration
for the comparison of tests.
defined in (3.1) is constructed by the use of optimal estimators of eigenvalues of
the precision matrix as pointed out by Ledoit and Wolf [16].
For there were no much work using these results from the data analysis point of views in the literature,
we adopt the permutation test based on good test statistics which may be easy to perform
and be robust in practice. Based on the discussions above, it seems reasonable
to adopt the decomposite statistic to perform the bootstrap procedure
for analyzing large dimensional data sets.
The rotation equivariance property is quite appropriate in the general situation where one has no
prior information about the orientation of the eigenvectors of population covariance matrix.
However, without having the consistent estimators of population eigenvalues matrix ,
it is still difficult to perform the test statistic precisely well even under the null hypothesis .
Those tests incorporated with the information of existing
in the literature also face the same difficulty, such as the estimation of Feng et al. [10].
One of the main goals of this work is to find out more information about population eigenvalues with the
help of delicate random matrix theory. As we may note that the joint density function of those
dependent eigenvalues is well known for the Wishart ensemble, and it is given by the
Marčenko-Pastur distribution for a system with large dimension when
. So the statistical significance of the correlations in the large system can be obtained
from the empirical eigenvalue spectrum distribution of the sample covariance matrix via the
Marčenko-Pastur distribution. This is one of the main advantages of the approach to obtain
the consistent eigenvalues and eigenvectors of population counterparts.
If the matrix is equal to the identity matrix, then our proposed -test will be
optimal for the hypothesis testing problem (eq:1.1). In this ideal situation, by Corollary 3
we then base on the normalized test statistic and usual normal theory to do the
work of data analysis. However, this study indicates that both the Stein’s estimator (2.1)
and the Ledoit and Wolf’s estimator (2.8) are not the
consistent estimators of . For the application of principle of analysis, we may remark that
it is still open to find out the consistent estimators of population eigenvalues and eigenvectors of
in the large dimensional system. At this stage, it may be too optimistic to expect
the whole information of can be revealed without any a prior
knowledge in its structure. Hence, we put this difficult but important problem as a future study.
Acknowledgments
The author would like to thank Professors Z.R. Chen and H.N. Hong from National Chiao Tung University, and Professor S.Y. Huang from Academia Sinica for their helpful discussions.
References
-
1.
T.W. Anderson (2003), An Introduction to Multivariate Statistical Analysis. 3rd edition. Wiley, New York.
-
2.
Bai and Saranadasa (1996), Effect of High Dimension: by an Example of a Two Sample Problem. Statistica Sinica, Vol. 6, No.2, 311–329.
-
3.
Bai, Z.D. and Miao, B.Q. and Yao, J.F. (2003), Convergence rates of spectral distributions of large sample covariance matrices. SIAM J. Matrix Anal. Appl., Vol.25, 105–127.
-
4.
Bai, Z.D. and Miao, B.Q. and Pan, G.M. (2007), On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab., Vol.35, 1532–1572.
-
5.
Chen, L.S. and Paul, D. and Prentice, R.L. and Wang, P. (2011), A Regularized Hotelling’s -Test for Pathway Analysis in Proteomic Studies. J. Am. Stat. Assoc., Vol.106, No.496, 1345–1360.
-
6.
Chen, S.X. and Qin, Y.L. (2010), A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist., Vol.38(2), 808–835.
-
7.
Choi, S.I. and Silverstein, J.W. (1995), Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal., Vol.54(2), 295–309.
-
8.
Dempster, A.P. (1958), A high dimensional two sample significance test. Ann. Math. Statist., Vol.29(4), 995–1010.
-
9.
Dempster, A.P. (1960), A significance test for the separation of two highly multivariate small samples. Biometrics, Vol.16(1),41–50.
-
10.
Feng, L. and Zou, C. and Wang, Z. and Zhu, L. (2017), Composite test for high-dimensional data. Statistica Sinica, Vol.27(3), 1419–1436.
-
11.
Johnstone, I. M. and Paul, D. (2018), PCA in High Dimensions: An Orientation. Proc. IEEE, Vol.106(8), 1277–1292.
-
12.
Ledoit, O. and Péché, S. (2011), Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Relat. Fields, Vol.151, 233–264.
-
13.
Ledoit, O. and Wolf, M. (2004), A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal., Vol.88, 365–411.
-
14.
Ledoit, O. and Wolf, M. (2012), Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist., Vol.40(2), 1024–1060.
-
15.
Ledoit, O. and Wolf, M. (2017), Numerical implementation of the QuEST function. Computational Statistics and Data Analysis, Vol.115, 199–223.
-
16.
Ledoit, O. and Wolf, M. (2018), Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. Bernoulli, Vol.24(4B), 3791–3832.
-
17.
Li, H. and Aue, A. and Paul, D. and Peng, J. and Wang, P. (2020), An adaptable generalization of Hotelling’s -test in high dimension. Ann. Statist., Vol.48(3), 1815 – 1847.
-
18.
Marčenko, V.A. and Pastur, L.A. (1967), Distribution of eigenvalues for some sets of random. Sb. Math., Vol.1, 457–483.
-
19.
Muirhead, R.J. (1982), Aspects of Multivariate Statistical Theory. Wiley, New York.
-
20.
Pan, G.M. and Zhou, W. (2011), Central limit theorem for Hotelling’s statistic under large dimension. Ann. Appl. Probab., Vol.21, 1860–1910.
-
21.
Park, J. and Ayyala, D.N. (2013), A test for the mean vector in large dimension and small samples. J. Statist. Plan. Infer., Vol.143(5), 929–943.
-
22.
Pitman, E.J.G. (1948), Lecture Notes on Nonparametric Statistical Inference: Lectures Given for the University of North Carolina. University of North Carolina.
-
23.
Srivastava, M.S. and Du, M. (2008), A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal., Vol.99(3), 386–402.
-
24.
Srivastava, M.S. (2009), A test for the mean vector with fewer observations than the dimension under non-normality. J. Multivariate Anal., Vol.100(3), 518–532.
-
25.
Silverstein, J.W. (1995), Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivariate Anal., Vol.55(2), 331-339.
-
26.
Stein, C. (1975), Estimation of a covariance matrix. Rietz lecture, 39th Annual Meeting IMS.
-
27.
Stein, C. (1986), Lectures on the theory of estimation of many parameters. J. Math. Sci, Vol.34, 1373–1403.
Institute of Statistical Science, Academia Sinica, Taipei. E-mail: [email protected]