1 Introduction

Let ${\bf X}_{i},i=1,\ldots,n$ , be $n$ $i.i.d.$ random vectors having a $p$ -dimensional multinormal distribution with mean vector $\mu$ and unknown positive definite covariance matrix $\Sigma$ . In this paper, we are interested in testing the hypothesis

\displaystyle H_{0}:{\mbox{\boldmath$\mu$}}={\bf 0}~{}\mbox{versus}~{}H_{1}:{% \mbox{\boldmath$\mu$}}\neq{\bf 0},~{}

(1.1)

when both dimension $p$ and sample size $n$ are large. Let

\displaystyle{\overline{\bf{X}}}=\frac{1}{n}\sum_{i=1}^{n}{\bf X}_{i}~{}\mbox{% and}~{}{\bf S}=\frac{1}{n-1}\sum_{i=1}^{n}({\bf X}_{i}-{\overline{\bf{X}}})({% \bf X}_{i}-{\overline{\bf{X}}})^{\top}.~{}

(1.2)

The Hotelling’s $T^{2}$ -test statistic is given by

\displaystyle T^{2}=n{\overline{\bf{X}}}^{\top}{\bf S}^{-1}{\overline{\bf{X}}}% .~{}

(1.3)

When the dimension $p~{}(<n)$ is fixed, the well-known Hotelling’s $T^{2}$ -test enjoys many optimal properties (Anderson [1]). However, when the dimension $p$ becomes large, the sample covariance matrix ${\bf S}$ may not be a consistent estimator of population covariance matrix $\Sigma$ when $p\geq n$ . Such situation makes it a hard work to estimate the precision matrix and to make further usage of it. Dempster [8] [9], Bai and Saranadasa [2] first observed this phenomenon, proposed a non-exact test for the hypothesis testing problem (1.1) with the dimension larger than the sample size. Three decades later, Bai and Saranadasa [2] proposed a new test, which ignored the information of ${\bf S}$ by taking identity matrix for replacement to simplify the problem. The result showed that their test has the same asymptotic power as that of the Dempster’s test under some assumptions on the dimension, mean vector $\mu$ and population covariance matrix $\Sigma$ . Along this line, Chen and Qin [6], further modified the test statistic of Bai and Saranadasa [2]. Srivastava and Du [23] and Srivastava [24] used the partial information of ${\bf S}$ , namely the diagonal elements, to construct new test statistic. Later, Park and Ayyala [21], modified the Srivastava type test by incorporating some information of correlations. Feng et al. [10] assumed that the matrix ${\bf S}$ has a kind of block diagonal structure to construct the composite Hotelling’s $T^{2}$ type test statistic. On the other hand, Chen et al. [5], used the quantity ${\bf S}+\lambda{\bf I}$ to replace ${\bf S}$ in (1.3), where $\lambda>0$ . They used the notion of ridge regression which is highly related to the concept of rotation-equivariant property after matrix decomposition. Then by using method of shrinkage estimation they constructed the regularized Hotelling’s test statistic and studied its asymptotic distribution. All the tests mentioned above can be viewed as various versions of regularized Hotelling’s $T^{2}$ -test. Most of the situations considered are under the setup that both dimension $p$ and sample size $n$ are large so that $\lim_{n\to\infty}p/n=c,c\in(0,\infty)$ . In this paper, we concentrate on the situation that $c\in(0,1)$ . Different from those approaches existing in the literature, our approach try to reveal more information of correlations in terms of eigenvalues. Stein [26] proposed the orthogonally equivariant estimator of covariance matrix and Ledoit and Wolf [14] proposed another orthogonally equivariant estimator of inverse covariance matrix. Ledoit and Wolf [16] claimed that their estimator is asymptotically optimal in the sense of minimizing the Stein loss. The rest of the paper is organized as follows. The notion of orthogonally equivariant estimators of covariance matrix for large dimensional situation and some simple notations of random matrix theory are introduced in Section 2. The decomposite $T^{2}$ -test statistic is presented in Section 3. And the asymptotically equivalent statistic $T^{2}_{0}$ along with its asymptotic local power property are also investigated in the same section. Power comparisons based on the asymptotic relative efficiency are discussed in Section 4. A real example is analyzed via the bootstrap test based on the decomposite $T^{2}$ -test statistic and the Hotelling’s $T^{2}$ -test statistic, respectively in Section 5. The conclusion is given in Section 6.

2 The orthogonally equivariant estimators

The class of orthogonally equivariant estimators of covariance matrix is constituted of all the estimators having the same eigenvectors as the sample covariance matrix. Consider the sample spectral decomposition, i.e., ${\bf S}={\bf U}{\mbox{\boldmath$\Lambda$}}{\bf U}^{\top}$ , where $\Lambda$ is a diagonal matrix with eigenvalues $\lambda_{i,p},i=1,\ldots,p$ , and ${\bf U}=({\bf u}_{1},\ldots,{\bf u}_{p})$ is the corresponding orthogonal matrix with ${\bf u}_{i}$ being the corresponding eigenvector with respective to $\lambda_{i,p},i=1,\ldots,p$ . Similarly, for the spectral decomposition of population covariance matrix, we have ${\mbox{\boldmath$\Sigma$}}={\bf V}{\mbox{\boldmath$\Gamma$}}{\bf V}^{\top}$ , where $\Gamma$ is a diagonal matrix with eigenvalues $\gamma_{i,p},i=1,\ldots,p$ , and ${\bf V}$ is the corresponding orthogonal matrix. With respect to the Stein loss function, Stein [26], [27] considered the orthogonally equivariant nonlinear shrinkage estimator which of the form

		$\displaystyle\widehat{\mbox{\boldmath$\Sigma$}}_{S}={\bf U}\widehat{\mbox{% \boldmath$\Phi$}}({\mbox{\boldmath$\Lambda$}}){\bf U}^{\top},\mbox{where}~{}% \widehat{\mbox{\boldmath$\Phi$}}({\mbox{\boldmath$\Lambda$}})=\mbox{diag}(% \widehat{\phi}_{1,p}({\mbox{\boldmath$\Lambda$}}),\ldots,\widehat{\phi}_{p,p}(% {\mbox{\boldmath$\Lambda$}}))~{}\mbox{with}~{}$		(2.1)
		$\displaystyle\widehat{\phi}_{i,p}({\mbox{\boldmath$\Lambda$}})=n\lambda_{i,p}% \left(n-p+1-2\lambda_{i,p}\sum_{j\neq i}\dfrac{1}{\lambda_{j,p}-\lambda_{i,p}}% \right)^{-1},~{}i=1,\ldots,p.$

However, some of the $1/(\lambda_{i,p}-\lambda_{j,p})$ might be negative and non-monotone. To mitigate the problems, Stein recommended to use an isotonizing algorithm procedure to adjust his estimators in (2.1). Stein’s isotonized estimator has been considered as a gold standard, thereafter a large strand of literature on orthogonally equivariant estimation of population covariance matrix was generated. The same as Ledoit and Péché [12], we make the following assumptions: A1. Let ${\bf X}_{i}=\mbox{\boldmath$\Sigma$}^{1/2}{\bf y}_{i},i=1,\ldots,n$ , where ${\bf y}_{i}$ is a $p\times 1$ vector of independent and identically distributed random variables $y_{ij}$ . Each $y_{ij}$ has mean $0$ , unit variance and 12th absolute central moment bounded by a constant. A2. For large $(n,p)$ setup, the large dimensional asymptotic framework is setup when $(n,p)\to\infty$ such that $p/n\to c$ is fixed, $0\leq c<1$ in this paper. A3. The population covariance matrix is positive definite matrix. Furthermore, $\|\mbox{\boldmath$\Sigma$}\|=O(p^{1/2})$ , where $\|\cdot\|$ is the $L_{2}$ norm of a matrix. A4. Let $0<\gamma_{1,p}\leq\cdots\leq\gamma_{p,p}$ . The empirical spectral distribution of $\Sigma$ defined by $H_{n}(\gamma)=\frac{1}{p}\sum_{i=1}^{p}1_{[\gamma_{i,p},\infty)}(\gamma)$ , converges as $p\to\infty$ to a probability distribution function $H(\gamma)$ at every point of continuity of $H$ . The support of $H$ , $\mbox{supp}(H)$ , is included in a compact set $[h_{1},h_{2}]$ with $0<h_{1}\leq h_{2}<\infty$ . Let $F_{n}(\lambda)=\frac{1}{p}\sum_{i=1}^{p}1_{[\lambda_{i,p},\infty)}(\lambda)$ be the sample spectral distribution and $F$ be its limiting distribution. We also assume that there exists a nonrandom real function $\phi(x)$ defined on the support of $F$ and is continuously differentiable on the support.
The Stieltjes transform of distribution function $F$ is defined by

\displaystyle m_{F}(z)=\int_{-\infty}^{\infty}\dfrac{1}{\lambda-z}\,dF(\lambda% ),\linebreak\forall z\in{\mathcal{C}}^{+},

(2.2)

where ${\mathcal{C}}^{+}$ is the half-plane of complex numbers with a strictly positive imaginary part. The empirical version is

\displaystyle m_{F_{n}}(z)=\dfrac{1}{p}\sum_{i=1}^{p}\dfrac{1}{\lambda_{i,p}-z}.

(2.3)

Choi and Silverstein [7] showed that

\displaystyle\mathop{lim}\limits_{z\in{\mathcal{C}}^{+}\to x\in\mathcal{R}-% \left\{0\right\}}m_{F}(z)=\check{m}_{F}(x)

(2.4)

exists. Subsequently, the well known Marčenko and Pastur equation (Choi and Silverstein [7]) in literature can be expressed in the form

\displaystyle m_{F}(z)=\int_{-\infty}^{\infty}\frac{1}{\gamma[1-c-czm_{F}(z)]-% z}dH(\gamma),\forall z\in{\mathcal{C}}^{+},

(2.5)

where $H$ denotes the limiting behavior of the population spectral distribution. Upon the Marčenko-Pastur equation, meaningful information of the population spectral distribution can be retrieved under the large dimensional asymptotic framework. Ledoit and Péché [12] extended the results to the more general situations including the case of the precision matrix ${\mbox{\boldmath$\Sigma$}^{-1}}$ . In addition to estimate the population covariance matrix $\Sigma$ , they also estimate the inverse of population covariance matrix $\mbox{\boldmath$\Sigma$}^{-1}$ . Consider $\Theta^{g}_{n}(z)=p^{-1}{\mbox{t}r}[({\bf S}-z{\bf I})^{-1}g(\mbox{\boldmath$% \Sigma$})]$ , where $g(\cdot)$ is a scale function on the eigenvalues of a matrix such that $g(\mbox{\boldmath$\Sigma$})={\bf V}\mbox{diag}(g(\gamma_{1,p}),\ldots,g(\gamma% _{p,p})){\bf V}^{\top}$ (Ledoit and Péché [12], page 236). Ledoit and Péché [12] proved that $\Theta^{g}_{n}(z)$ converges to $\Theta^{g}(z)$ almost surely under the conditions A1-A4, where

\displaystyle\Theta^{g}(z)=\int_{-\infty}^{\infty}\frac{1}{\gamma[1-c-czm_{F}(% z)]-z}g(\gamma)dH(\gamma),\forall z\in{\mathcal{C}}^{+}.

(2.6)

Note that if $g(\mbox{\boldmath$\Sigma$})={\bf I}$ , then the equation (2.6) reduces to the equation (2.5). Ledoit and Wolf [14] suggested to use the oracle estimators ${\bf P}^{or}={\bf U}{\bf A}^{or}{\bf U}^{\top}$ for ${\mbox{\boldmath$\Sigma$}}^{-1},\mbox{where}~{}{\bf A}^{or}=\mbox{diag}({a}^{% or}_{1},\ldots{a}^{or}_{p})$ , with

\displaystyle~{}a^{or}_{i}=\lambda^{-1}_{i}\left(1-c-2{c}\lambda_{i}\mbox{Re}% \left[\check{m}_{F}(\lambda_{i})\right]\right),c\in[0,1),\forall i=1,\ldots,p.

(2.7)

Note that $\lambda_{i}$ is the quantile, i.e., $F^{-1}(\alpha)=\lambda_{i}$ with $[p\alpha]=i,0<\alpha<1$ , $i=1,\ldots,p$ , where $[x]$ denotes the largest integer of $x$ . Also note that $a^{or}_{i}$ is nonrandom and an estimable quantity, $\forall i=1,\ldots,p$ . Let ${\bf u}^{\top}_{i}{\mbox{\boldmath$\Sigma$}}^{-1}{\bf u}_{i}=a^{*}_{i}$ , Ledoit and Péché [12] showed that $a^{*}_{i}$ is approximated by $a^{or}_{i},i=1,\ldots,p$ . Slightly different from the Stein’s estimator in (2.1), Ledoit and Wolf [14] proposed the estimator of $\Sigma$ which is of the form

		$\displaystyle\widehat{\mbox{\boldmath$\Sigma$}}_{LW}={\bf U}\widehat{\mbox{% \boldmath$\Phi$}}^{}({\mbox{\boldmath$\Lambda$}}){\bf U}^{\top},\mbox{where}~% {}\widehat{\mbox{\boldmath$\Phi$}}^{}({\mbox{\boldmath$\Lambda$}})=\mbox{diag% }(\widehat{\phi}^{}_{1,p}({\mbox{\boldmath$\Lambda$}}),\ldots,\widehat{\phi}^% {}_{p,p}({\mbox{\boldmath$\Lambda$}}))$		(2.8)
		$\displaystyle~{}\mbox{with}~{}\widehat{\phi}^{*}_{i,p}({\mbox{\boldmath$% \Lambda$}})=\frac{\lambda_{i,p}}{1-\frac{p}{n}-2\frac{p}{n}\lambda_{i,p}\mbox{% Re}[\check{m}^{\widehat{\mbox{\boldmath$\tau$}}_{n}}_{n,p}({\lambda_{i,p}})]},% \forall i=1,\ldots,p,$

where $\check{m}^{\widehat{\mbox{\boldmath$\tau$}}_{n}}_{n,p}(\cdot)$ is the estimator of $\check{m}_{F}(\cdot)$ , as well as a multivariate quantized eigenvalues sample function. Ledoit and Wolf [14] showed that $||\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}-{\bf P}^{or}||/\sqrt{p}\to 0$ a.s. with rescaled Frobenius norm concluded that $\mbox{Re}[\check{m}^{\widehat{\mbox{\boldmath$\tau$}}_{n}}_{n,p}({\lambda_{i,p% }})]$ is the consistent estimator of $\mbox{Re}[\check{m}_{F}(\lambda_{i})],\forall~{}i=1,\ldots,p$ . Ledoit and Wolf [16] concluded that $0<\widehat{\phi}^{*}_{1,p}({\mbox{\boldmath$\Lambda$}})<\widehat{\phi}^{*}_{2,% p}({\mbox{\boldmath$\Lambda$}})<\ldots<\widehat{\phi}^{*}_{p,p}({\mbox{% \boldmath$\Lambda$}})$ with probability one, further asserted the asymptotic optimality of their shrinkage estimator (2.8) under Stein loss function.
Ledoit and Wolf [16] pointed out that both the estimators in (2.1) and (2.8) have a similar form in terms of Cauchy principal value. The only difference between (2.1) and (2.8) is that the former uses the empirical distribution $F_{n}$ of sample eigenvalues to estimate the Stieltjes transform of distribution function $m_{F}(z)$ , while the latter one uses a smoothed version $F^{\widehat{\mbox{\boldmath$\tau$}}_{n}}_{n,p}$ instead. They also commented that Stein’s estimator in (2.1) has theoretical limitations and claimed that their estimator performs better compared to Stein’s estimator, by the evidence of Monte-Carlo simulations.

3 Main results

3.1 The decomposite test statistic $T^{2}_{N}$

For the problem (1.1), those tests proposed in the literature basically are made by ignoring or using partial information from the sample covariance matrix. The approach we adopt is to reveal the information of eigenvalues with the help of random matrix theory. The orthogonally equivariant estimators of covariance matrix generally enjoy some optimal properties. The optimal one among the class of orthogonally equivariant estimators is mostly desired. Ledoit and Wolf [14] claimed that $\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}$ is asymptotically optimal estimator of ${\mbox{\boldmath$\Sigma$}}^{-1}$ under Stein loss. And hence, for the hypothesis testing problem (1.1) we may consider the test statistic

\displaystyle~{}T^{2}_{N}=n{\overline{\bf{X}}}^{\top}\widehat{\mbox{\boldmath$% \Sigma$}}^{-1}_{LW}{\overline{\bf{X}}}.

(3.1)

We may also note that $\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}$ can be replaced by $\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{S}$ , which is the inverse of matrix defined in (2.1). Let ${\bf S}={\bf U}{\mbox{\boldmath$\Lambda$}}{\bf U}^{\top}=(s_{ij})$ , take $\widehat{\mbox{\boldmath$\Phi$}}^{*}(\mbox{\boldmath$\Lambda$})$ in (2.8) as (a) $\widehat{\mbox{\boldmath$\Phi$}}_{0}(\mbox{\boldmath$\Lambda$})={\bf I}$ , (b) $\widehat{\mbox{\boldmath$\Phi$}}_{1}(\mbox{\boldmath$\Lambda$})=\mbox{diag}(s_% {11},\ldots,s_{pp}),{\bf U}={\bf I}$ , (c) $\widehat{\mbox{\boldmath$\Phi$}}_{2}(\mbox{\boldmath$\Lambda$})=\mbox{diag}(% \lambda_{1,p},\ldots,\lambda_{p,p})$ and (d) $\widehat{\mbox{\boldmath$\Phi$}}_{3}(\mbox{\boldmath$\Lambda$})=\mbox{diag}(% \lambda_{1,p}+\lambda,\ldots,\lambda_{p,p}+\lambda)$ , $\lambda>0$ , then it (3.1) reduces to the case of (a) Bai and Saranadasa [2], (b) Li et al. [17], (c) the Hotelling’s $\mbox{T}^{2}$ -test (1.3) and (d) the regularized Hotelling’s test Bai et al. [3] statistics, respectively. First, we may note that based on the results of Theorem 5 of Dempster [9], ${\bf u}^{\top}_{i}{\mbox{\boldmath$\Sigma$}}^{-1}{\bf u}_{i}=a^{*}_{i}$ are approximated by the quantity $a^{or}_{i},i=1,\ldots,p$ . Johnstone and Paul [11] proved that $\|\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}-{\bf P}^{or}\|/\sqrt{p}\to 0$ a.s. under the rescaled Frobenius norm. Let ${\mbox{\boldmath$\Sigma$}}={\bf V}{\mbox{\boldmath$\Gamma$}}{\bf V}^{\top}$ and ${\mbox{\boldmath$\Sigma$}}_{1}={\bf V}{\mbox{\boldmath$\Gamma$}}^{*}{\bf V}^{\top}$ , where ${\mbox{\boldmath$\Gamma$}}=\mbox{diag}(\gamma_{1,p},\ldots,\gamma_{p,p})$ and ${\mbox{\boldmath$\Gamma$}}^{*}=\mbox{diag}(\gamma^{*}_{1,p},\ldots,\gamma^{*}_% {p,p})$ with $\gamma^{*}_{i,p}=1/a^{or}_{i},~{}i=1,\ldots,p$ . Since ${\bf A}=(n-1){\bf S}$ is Wishart distributed, when p is fixed we may note that ${\bf U}$ is the maximum likelihood estimator (MLE) of ${\bf V}$ Muirhead [19]. From the general theory of estimation that the maximum likelihood estimator is consistent, it tends to the true value with probability one as sample size becomes large under some regularity conditions which are satisfied by Wishart density. When the dimension $p$ is fixed, we may conclude that ${\bf U}$ converges to ${\bf V}$ with probability one. Note that when $p$ is fixed and the sample size $n$ is large, $\widehat{\mbox{\boldmath$\Sigma$}}_{LW}$ reduces to the sample covariance matrix ${\bf S}$ . Then $\widehat{\mbox{\boldmath$\Sigma$}}_{LW}$ converges to $\Sigma$ with probability one. Namely, when dimension $p$ is fixed while the sample size $n$ is large, the decomposite $T^{2}_{N}$ -test reduces to Hotelling’s $T^{2}$ -test statistic. Nevertheless, this optimal property remains wide open for large $p$ situation. To overcome the difficulties, we also restrict the estimator of covariance matrix to the class of orthogonally equivariant estimators by imposing the sample eigenvector ${\bf u}_{i}$ on the corresponding population eigenvector ${\bf v}_{i}$ in this paper.

3.2 The asymptotically equivalent statistic of $T^{2}_{N}$

The decomposite $T^{2}_{N}$ -test statistic in (3.1) involves a mixture information of nonlinear sample eigenvalues that complicates the task of deriving its distribution function. By virtue of random matrix theory, Pan and Zhou [20] derived the limiting distribution function of Hotelling’s $T^{2}$ -test statistic when ${\mbox{\boldmath$\Sigma$}}={\bf I}$ . Meanwhile, Chen et al. [5] used Stieltjes transform to derive the asymptotic power distribution, under $\mbox{H}_{0}$ , of the regularized Hotelling’s $T^{2}$ -test statistic, which involves the linear function of sample eigenvalues. Li et al. [17] extended the result for the one-sample regularized Hotelling’s $T^{2}$ -test of Chen et al. [5] to the two-sample problem under a class of local alternatives.
Both the asymptotic power functions for the one-sample regularized Hotelling’s $T^{2}$ -test of Chen et al. (Theorem 1 and Proposition 2) and for the two-sample regularized Hotelling’s $T^{2}$ -test of Li et al. [17] are the functions of Stieltjes transform $\check{m}_{F}(x)$ , defined in (2.4), and its derivative.
Note that $\check{m}_{F}(x)$ , when $x\in\mathcal{R}-\{0\}$ , includes the real part $\mbox{Re}[\check{m}_{F}(x)]=\int_{-\infty}^{\infty}\dfrac{1}{t-x}\,dF(t)$ (Hilbert transform) and the imaginary part $\mbox{Im}[\check{m}_{F}(x)]=\pi f(x)$ , where $f(x)=dF(x)/dx$ . For example, when $\mbox{\boldmath$\Sigma$}={\bf I}$ the empirical density function of eigenvalues converges weakly in probability to the Mar $\check{c}$ enko-Pastur law $f(x)=\dfrac{1}{2\pi cx}\sqrt{(\xi_{+}-x)(x-\xi_{-})}\;,~{}x\in(\xi_{-},\;\xi_{% +})$ , where $\xi_{-}=(1-\sqrt{c})^{2}$ and $\xi_{+}=(1+\sqrt{c})^{2}$ . Then, with $-\lambda=x$ and $\gamma=c$ in equation (13) of Chen et al. [5] the Stieltjes transform of $F(x)$ becomes

$\displaystyle\check{m}_{F}(x)=$	$\displaystyle{}\dfrac{(1-c-x)+\sqrt{(1-c-x)^{2}-4cx}}{2cx}$	(3.2)
$\displaystyle=$	$\displaystyle{}\dfrac{(1-c-x)+\sqrt{-[(x-1-c)^{2}-4c]}}{2cx}$
$\displaystyle=$	$\displaystyle{}\dfrac{(1-c-x)+i\sqrt{(\xi_{+}-x)(x-\xi_{-})}}{2cx},~{}x\in(\xi% _{-},\;\xi_{+}).$

On the other hand, if we know the Stieltjes transform, we can from it deduce the limiting spectral density function $f(x)=\dfrac{1}{\pi}\mbox{Im}[\check{m}_{F}(x)]$ .
Thus both two asymptotic power functions of the regularized Hotelling’s $T^{2}$ -tests for one-sample problem by Chen et al. [5] and for two-sample one by Li et al. [17] are complex value functions, which seem to be against statistical common sense for real test statistics.
Under assumptions A1-A4, Marčenko and Pastur [18] proved that $F_{n}$ converges to $F$ a.s.. It is well known that $m_{F_{n}}(z)$ converges to $m_{F}(z)$ a.s., $z\in\mathcal{C}^{+}$ . Both Chen et al. [5] and Li et al. [17] concluded this convergence holds even when $x\in\mathcal{R}-\left\{0\right\}$ and directly used $m_{F_{n}}(x)$ as the consistent estimator of $\check{m}_{F}(x)$ to prove Proposition 1 of Chen et al. [5], when $x\in\mathcal{R}-\left\{0\right\}$ . However, when $x\in\mathcal{R}-\left\{0\right\}$ we may note that

	$\displaystyle~{}m_{F_{n}}(x)=$	$\displaystyle{}\dfrac{1}{p}\sum_{i=1}^{p}\dfrac{1}{\lambda_{i,p}-x}=\int_{-% \infty}^{\infty}\dfrac{1}{t-x}\,dF_{n}(t)$		(3.3)
	$\displaystyle\to$	$\displaystyle{}\int_{-\infty}^{\infty}\dfrac{1}{t-x}\,dF(t)=\mbox{Re}[\check{m% }_{F}(x)].$

Hence $m_{F_{n}}(x)$ does not converge to $\check{m}_{F}(x)$ when $x\in\mathcal{R}-\left\{0\right\}$ . Thus, the Proposition 1 of Chen et al. [5] is not corrected and needs to be further re-investigated.
To overcome the difficulties mentioned above we may instead try to find the asymptotically equivalent statistic in distribution for $T^{2}_{N}$ called $T^{2}_{0}$ , defined in (3.6), which asymptotically local distribution and asymptotically local power function can be acquired.
We assumed that the data come from the multinormal distribution, thus the sample mean vector $\overline{\bf{X}}$ is independent of the sample covariance matrix ${\bf S}$ , namely $\overline{\bf{X}}$ is independent of ${\bf U}$ and $\Lambda$ .
Under assumptions A1-A4, Ledoit and Wolf [14] showed that $\widehat{\phi}^{*}_{i,p}(\mbox{\boldmath$\Lambda$})$ converges to $1/a^{or}_{i}=\gamma^{*}_{i,p}$ a.s. as $n\to\infty$ , $i=1,\ldots,p$ . And they ([14], Proposition 4.3) further proved that $\|\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}-{\bf U}\mbox{\boldmath$\Gamma$}% ^{*-1}{\bf U}^{\top}\|_{F}\to 0$ a.s. as $p\to\infty$ ( i.e., $\|\widehat{\mbox{\boldmath$\Phi$}}^{*-1}(\mbox{\boldmath$\Lambda$})-\mbox{% \boldmath$\Gamma$}^{*-1}\|_{F}\to 0$ a.s. as $p\to\infty$ ), where $\|\cdot\|_{F}$ is the Frobenius norm defined as $\|A\|_{F}=\sqrt{\mbox{tr}\left(AA^{\top}\right)/p}$ with tr(A) denoting the trace of matrix A.
Since $\widehat{\phi}^{*}_{i,p}(\mbox{\boldmath$\Lambda$})$ converges to $\gamma^{*}_{i,p}$ a.s. as $p\to\infty,i=1,\ldots,p$ , namely, $\widehat{\phi}^{*-1}_{i,p}(\mbox{\boldmath$\Lambda$})-\gamma^{*-1}_{i,p}=o_{p}% (1),i=1,\ldots,p,$ where $o_{p}(1)=\frac{1}{p^{\gamma}},~{}r\in(0,\infty)$ . Thus without loss of generality, for simplicity we may write $\widehat{\phi}^{*-1}_{i,p}(\mbox{\boldmath$\Lambda$})-\gamma^{*-1}_{i,p}=\frac% {1}{p^{\gamma}},i=1,\ldots,p$ . That is the same to say that $\widehat{\mbox{\boldmath$\Phi$}}^{*-1}(\mbox{\boldmath$\Lambda$})-{\mbox{% \boldmath$\Gamma$}^{*}}^{-1}=\frac{1}{p^{\gamma}}{\bf I}$ , as $p\to\infty$ . Note that $n{\overline{\bf{X}}}^{\top}{\bf U}\left\{{\widehat{\mbox{\boldmath$\Phi$}^{*}}% }^{-1}(\mbox{\boldmath$\Lambda$})-{\mbox{\boldmath$\Gamma$}^{*}}^{-1}\right\}$ ${\bf U}^{\top}{\overline{\bf{X}}}=\frac{n}{p^{\gamma}}{\overline{\bf{X}}}^{% \top}{\bf U}{\bf U}^{\top}{\overline{\bf{X}}}=\frac{n}{p^{\gamma}}{\overline{% \bf{X}}}^{\top}{\overline{\bf{X}}}\to\frac{1}{p^{\gamma}}\mbox{tr}(\mbox{% \boldmath$\Sigma$}^{0})$ as $p\to\infty$ , where $\mbox{\boldmath$\Sigma$}^{0}=\mbox{\boldmath$\Sigma$}+n\mbox{\boldmath$\mu$}% \mbox{\boldmath$\mu$}^{\top}$ , $\mbox{\boldmath$\Sigma$}^{0}$ is also a positive definite matrix under local alternative (3.10). Decompose $\mbox{\boldmath$\Sigma$}^{0}$ as ${\bf V}^{0}\mbox{\boldmath$\Gamma$}^{0}{{\bf V}^{0}}^{\top}$ , where $\mbox{\boldmath$\Gamma$}^{0}=\mbox{diag}(\gamma_{1,p}^{0},\ldots,\gamma_{p,p}^% {0})$ with $0<\gamma^{0}_{1,p}<\ldots<\gamma^{0}_{p,p}<\infty$ . Let $b_{1}=\lim_{p\to\infty}\frac{1}{p^{r}}\mbox{tr}(\mbox{\boldmath$\Sigma$}^{0})$ , we have the following three situations (i) $b_{1}\to\infty$ when $\gamma\in(0,1)$ , (ii) $b_{1}\to 0$ when $\gamma\in(1,\infty)$ and (iii) $b_{1}$ is a nonrandom but unknown constant when $\gamma=1$ . Note that for the case (i), we may have $n\overline{\bf{X}}^{\top}\widehat{\mbox{\boldmath$\Sigma$}}^{*-1}\overline{\bf% {X}}\to\infty$ , which is against statistical common sense. For the case (iii), $b_{1}\to 0$ when $\gamma\in(0,\infty)$ , namely, it is the same as the fixed dimensional case. Hence without loss of generality, we may only consider the case $\gamma=1$ in details. Note that $0<p\gamma^{0}_{1,p}\leq\mbox{tr}(\mbox{\boldmath$\Sigma$}^{0})\leq p\gamma^{0}% _{p,p}$ , thus $0<\gamma^{0}_{1,p}\leq b_{1}\leq\gamma^{0}_{p,p}<\infty$ . As such, $n{\overline{\bf{X}}}^{\top}{\bf U}\left\{{\widehat{\mbox{\boldmath$\Phi$}^{*}}% }^{-1}(\mbox{\boldmath$\Lambda$})-{\mbox{\boldmath$\Gamma$}^{*}}^{-1}\right\}$ ${\bf U}^{\top}{\overline{\bf{X}}}\to b_{1}$ a.s. as $p\to\infty$ . Hence for the high dimensional case, we may obtain that $T^{2}_{N}=n{\overline{\bf{X}}}^{\top}{\bf U}{\widehat{\mbox{\boldmath$\Phi$}^{% *}}}^{-1}{\bf U}^{\top}{\overline{\bf{X}}}\to n{\overline{\bf{X}}}^{\top}{\bf U% }{{\mbox{\boldmath$\Gamma$}^{*}}}^{-1}{\bf U}^{\top}{\overline{\bf{X}}}+b_{1}$ in probability as $p\to\infty$ . This result implies that $T^{2}_{N}\to n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{% \bf U}^{\top}{\overline{\bf{X}}}+b_{1}$ in distribution as $p\to\infty$ . Namely,

\displaystyle\dfrac{T^{2}_{N}-\mathcal{E}T^{2}_{N}}{\sqrt{{\mathcal{V}ar}T^{2}% _{N}}}\stackrel{{\scriptstyle\mathscr{D}}}{{\Longrightarrow}}\dfrac{n{% \overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{\bf U}^{\top}{% \overline{\bf{X}}}+b_{1}-{\mathcal{E}}\left(n{\overline{\bf{X}}}^{\top}{\bf U}% \mbox{\boldmath$\Gamma$}^{*-1}{\bf U}^{\top}{\overline{\bf{X}}}+b_{1}\right)}{% \sqrt{{\mathcal{V}ar}\left(n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$% \Gamma$}^{*-1}{\bf U}^{\top}{\overline{\bf{X}}}+b_{1}\right)}}~{}~{}\mbox{as}~% {}p\to\infty.

Furthermore, it is easy to note that under normalization both two statistics $n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{\bf U}^{\top}% {\overline{\bf{X}}}+b_{1}$ and $n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{\bf U}^{\top}% {\overline{\bf{X}}}$ have the same asymptotic distribution function as $p\to\infty$ . Thus we may obtain that

\displaystyle\dfrac{T^{2}_{N}-\mathcal{E}T^{2}_{N}}{\sqrt{{\mathcal{V}ar}T^{2}% _{N}}}\stackrel{{\scriptstyle\mathscr{D}}}{{\Longrightarrow}}\dfrac{n{% \overline{\bf{X}}}^{\top}{\bf U}{\mbox{\boldmath$\Gamma$}}^{*-1}{\bf U}^{\top}% {\overline{\bf{X}}}-\mathcal{E}\left(n{\overline{\bf{X}}}^{\top}{\bf U}{\mbox{% \boldmath$\Gamma$}}^{*-1}{\bf U}^{\top}{\overline{\bf{X}}}\right)}{\sqrt{{% \mathcal{V}ar}\left(n{\overline{\bf{X}}}^{\top}{\bf U}{\mbox{\boldmath$\Gamma$% }}^{*-1}{\bf U}^{\top}{\overline{\bf{X}}}\right)}}~{}~{}\mbox{as}~{}p\to\infty.

Hence we may have the following conclusion:

\displaystyle~{}T^{2}_{N}=~{}n{\overline{\bf{X}}}^{\top}{\bf U}\widehat{\mbox{% \boldmath$\Phi$}}^{*-1}(\mbox{\boldmath$\Lambda$}){\bf U}^{\top}{\overline{\bf% {X}}}\to~{}n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{% \bf U}^{\top}{\overline{\bf{X}}}

(3.4)

in distribution as $p\to\infty$ . When $\mbox{\boldmath$\Sigma$}={\bf I}$ , from (3.2) we may note that $\mbox{Re}[\check{m}_{F}(x)]=(1-c-x)/(2cx)$ and then from (2.7) we have that $1/a^{or}_{i}=\gamma^{*}_{i,p}=1,~{}i=1,\ldots,p$ , i.e., $\Gamma^{*}={\bf I}$ . Thus, by equation (3.4) we may obtain that $T^{2}_{N}\to n{\overline{\bf{X}}}^{\top}{\bf U}{\bf U}^{\top}{\overline{\bf{X}% }}=n{\overline{\bf{X}}}^{\top}{\overline{\bf{X}}}\sim\chi_{p}^{2}(n{\mbox{% \boldmath$\mu$}}^{\top}{\mbox{\boldmath$\mu$}})$ in distribution as $p\to\infty$ , where $\chi_{p}^{2}(n{\mbox{\boldmath$\mu$}}^{\top}{\mbox{\boldmath$\mu$}})$ is non-central chi-square distributed with $p$ degrees of freedom and non-centrality $n{\mbox{\boldmath$\mu$}}^{\top}{\mbox{\boldmath$\mu$}}$ . Therefore, we have the following theorem.

Theorem 1.

Under assumptions A1-A4, when $\mbox{\boldmath$\Sigma$}={\bf I}$ , then $T^{2}_{N}$ is asymptotically equivalent to $\chi_{p}^{2}(n{\mbox{\boldmath$\mu$}}^{\top}{\mbox{\boldmath$\mu$}})$ in distribution as $p\to\infty$ .

For the hypothesis testing problem (1.1), when $\mbox{\boldmath$\Sigma$}={\bf I}$ Theorem 1 indicates that the decomposite $T^{2}_{N}$ -test may be asymptotically optimal when the dimension is large, while the Hotelling’s $T^{2}$ -test is not.
However, the situation may be different for general $\Sigma$ . If ${\bf U}$ as the consistent estimator of ${\bf V}$ can be true, we then have $\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}$ converges to ${\mbox{\boldmath$\Sigma$}}^{-1}_{1}$ with probability one. Despite that we adopt the Ledoit and Wolf’s optimal estimator (eq:2.8) of the population precision matrix, however, ${\mbox{\boldmath$\Sigma$}}_{1}$ may not generally be equal to $\Sigma$ , unless that ${\mbox{\boldmath$\Gamma$}}^{*}={\mbox{\boldmath$\Gamma$}}$ (i.e., ${1}/{{a^{or}_{i}}}=\gamma_{i,p},\forall i=1,\ldots,p$ ). The orthogonal matrix $\bf U$ may not generally be a consistent estimator of $\bf V$ when the dimension $p$ is large, (see Bai et al. [4] and references therein). Hence we may work it under the restricted model, namely, under the Wishart distribution setup when $p/n\to c\in(0,1)$ .
Note that ${\bf U}\in\mathcal{Q}(p)$ , the group of $p\times p$ orthogonal matrices, which is a compact group. Hence, there exists a subsequence $\{n_{1}\}$ such that ${\bf U}_{n_{1}}$ converges to ${\bf V}$ a.s. as $n\to\infty$ . For the case $c\in(0,1)$ , when $\mbox{\boldmath$\Sigma$}={\bf I}$ we obtain that $\mbox{\boldmath$\Gamma$}^{*}={\bf I}$ as $p\to\infty$ , and hence we may have that $n{\overline{\bf{X}}}^{\top}{\bf U}\mbox{\boldmath$\Gamma$}^{*-1}{\bf U}^{\top}% {\overline{\bf{X}}}=n{\overline{\bf{X}}}^{\top}{\bf U}{\bf U}^{\top}{\overline% {\bf{X}}}=n{\overline{\bf{X}}}^{\top}{\overline{\bf{X}}}=n{\overline{\bf{X}}}^% {\top}{\bf V}{\bf V}^{\top}{\overline{\bf{X}}}=n{\overline{\bf{X}}}^{\top}{\bf V% }\mbox{\boldmath$\Gamma$}^{*-1}{\bf V}^{\top}{\overline{\bf{X}}}$ . Thus by the equation (3.4), when $\mbox{\boldmath$\Sigma$}={\bf I}$ we have that $T^{2}_{N}\to n{\overline{\bf{X}}}^{\top}{\bf V}{\mbox{\boldmath$\Gamma$}}^{*-1% }{\bf V}^{\top}{\overline{\bf{X}}}$ in distribution as $p\to\infty$ .
In the general $\Sigma$ case, for simplicity we may investigate the limiting distribution function of $T^{2}_{N}$ under the assumption that ${\bf U}$ converges to ${\bf V}_{1}$ a.s. in the weak topology as $n\to\infty$ , ${\bf V}_{1}\in\mathcal{Q}(p)$ . By equation (3.4) we then have

\displaystyle~{}T^{2}_{N}\to n{\overline{\bf{X}}}^{\top}{\bf V}_{1}{\mbox{% \boldmath$\Gamma$}}^{*-1}{\bf V}_{1}^{\top}{\overline{\bf{X}}}

(3.5)

in distribution as $p\to\infty$ .
Furthermore, since $\mathcal{Q}(p)$ is one orbit, and hence by the theory of compact group there exist some ${\bf G}\in\mathcal{Q}(p)$ with ${\bf G}^{\top}{\bf V}_{1}={\bf V}$ . Therefore

\displaystyle~{}T^{2}_{N}\to n{\overline{\bf{X}}}^{\top}{\bf G}{\bf V}{\mbox{% \boldmath$\Gamma$}}^{*-1}{\bf V}^{\top}{\bf G}^{\top}{\overline{\bf{X}}}=n{% \overline{\bf{X}}}^{\top}{\bf G}{\mbox{\boldmath$\Sigma$}}^{-1}_{1}{\bf G}^{% \top}{\overline{\bf{X}}}=T^{2}_{0},~{}\mbox{say}

(3.6)

in distribution as $p\to\infty$ . Note that ${\bf G}{\mbox{\boldmath$\Sigma$}}^{-1}_{1}{\bf G}^{\top}\neq\mbox{\boldmath$% \Sigma$}^{-1}$ generally. This will make things for high-dimensional situations different from those for the fixed dimensions. Both Stein [26] and Ledoit and Wolf [14] directly considered the case when ${\bf G}={\bf I}$ . Namely, when $\bf U$ converges to $\bf V$ a.s. as $p\to\infty$ .

Theorem 2.

Assume that ${\bf U}$ converges to ${\bf V}_{1}$ a.s. in the weak topology as $n\to\infty$ , ${\bf V}_{1}\in\mathcal{Q}(p)$ . Then under assumptions A1-A4, $T^{2}_{N}$ is asymptotically equivalent to $T^{2}_{0}$ ( defined in (3.6)) in distribution as $p\to\infty$ .

3.3 The asymptotic distribution of $T^{2}_{N}$

Let ${\bf Z}=\sqrt{n}{\mbox{\boldmath$\Sigma$}}^{-1/2}{\overline{\bf{X}}}$ , then ${\bf Z}\sim N(\sqrt{n}{\mbox{\boldmath$\Sigma$}}^{-1/2}{\mbox{\boldmath$\mu$}}% ,{\bf I})$ . Let ${\mbox{\boldmath$\Sigma$}}_{2}=\mbox{\boldmath$\Sigma$}^{1/2}{\bf G}\mbox{% \boldmath$\Sigma$}^{-1}_{1}{\bf G}^{\top}\mbox{\boldmath$\Sigma$}^{1/2}$ , and decompose it as ${\bf V}_{2}\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Gamma$}){\bf V}_{2}^{\top}$ , where $\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Gamma$})=\mbox{diag}(\psi_{1},\ldots,% \psi_{p}),~{}0<\psi_{1}\leq\psi_{2}\leq\ldots\leq\psi_{p}<\infty$ . Let ${\bf W}={\bf V}_{2}{\bf Z}$ , then ${\bf W}\sim N(\mbox{\boldmath$\theta$},{\bf I})$ , where $\theta$ = $\sqrt{n}{\bf V}_{2}{\mbox{\boldmath$\Sigma$}}^{-1/2}\mbox{\boldmath$\mu$}=(% \theta_{1},\ldots,\theta_{p})^{\top}$ . And

\displaystyle T^{2}_{0}={\bf Z}^{\top}{\mbox{\boldmath$\Sigma$}}_{2}{\bf Z}={% \bf W}^{\top}{\mbox{\boldmath$\Psi$}}(\mbox{\boldmath$\Gamma$}){\bf W}=\sum_{i% =1}^{p}{\psi}_{i}w_{i}^{2},

(3.7)

which is the mixture of non-central chi-square distributions. By the results of Corollary 1.3.5 of Muirhead [19], after some straightforward algebraic calculations, we have

\displaystyle{\mathcal{E}}T^{2}_{0}

\displaystyle=\sum_{i=1}^{p}\psi_{i}+\sum_{i=1}^{p}\psi_{i}{\theta}^{2}_{i}~{}

(3.8)

and

\displaystyle{\mathcal{V}ar}T^{2}_{0}

\displaystyle=2\sum_{i=1}^{p}\psi_{i}^{2}+4\sum_{i=1}^{p}\psi^{2}_{i}{\theta}^% {2}_{i}.

(3.9)

Generally, the power of any reasonable test goes to one when the sample size $n$ is large (Chen et al. [5],Theorem 1). Thus, it is hard to compare the tests when the sample size $n$ goes to infinity. As such, we may use the local power to compare the tests. We extent the concept of local power from the fixed dimensional situation to the large dimensional one. Different from the fixed dimensional one, we incorporate the dimension $p$ into the consideration for the large dimensional situation. We study the asymptotic distribution of $T^{2}_{N}$ under the sequence of local alternatives

\displaystyle~{}H_{0}:{\mbox{\boldmath$\mu$}}={\bf 0}~{}\mbox{versus}~{}H_{1n}% :{\mbox{\boldmath$\mu$}}=n^{-1/2}p^{1/4}{\mbox{\boldmath$\delta$}},

(3.10)

where $\delta$ is a fixed $p$ -dimensional vector, which means to assume that ${\mbox{\boldmath$\delta$}}^{\top}{\mbox{\boldmath$\Sigma$}}^{-1}{\mbox{% \boldmath$\delta$}}<\infty$ when $p$ is large. We may remark that this local alternative is equivalent to the one $\|{\mbox{\boldmath$\mu$}}\|=O(n^{-1/2}p^{1/4})$ considered in Feng et al. [10].
Let $\mbox{\boldmath$\mu$}=n^{-1/2}p^{1/4}\mbox{\boldmath$\delta$}$ , then $\theta$ = $p^{1/4}$ $\beta$ , where $\beta$ = ${\bf V}_{2}\mbox{\boldmath$\Sigma$}^{-1/2}{\mbox{\boldmath$\delta$}}$ . Note that $\sum_{i=1}^{p}\psi_{i}\,\beta^{2}_{i}\leq\psi_{p}\sum_{i=1}^{p}\beta^{2}_{i}=% \psi_{p}{\mbox{\boldmath$\delta$}}^{\top}\mbox{\boldmath$\Sigma$}^{-1}{\mbox{% \boldmath$\delta$}}<\infty$ and $\sum_{i=1}^{p}\psi^{2}_{i}\beta^{2}_{i}\leq\psi^{2}_{p}\sum_{i=1}^{p}\beta^{2}% _{i}=\psi^{2}_{p}{\mbox{\boldmath$\delta$}}^{\top}\mbox{\boldmath$\Sigma$}^{-1% }{\mbox{\boldmath$\delta$}}<\infty$ . Thus, $p^{-1/2}\sum_{i=1}^{p}\psi_{i}\,\beta^{2}_{i}\to 0$ and $p^{-1/2}\sum_{i=1}^{p}\psi^{2}_{i}\beta^{2}_{i}\to 0$ as $p\to\infty$ .

Theorem 3.

Under the assumptions of Theorem 2 and the sequence of local alternatives $H_{1n}$ defined in (3.10), the asymptotic power function of test statistic $T^{2}_{N}$ in (3.1) is

\displaystyle~{}\beta(\mbox{\boldmath$\mu$})\approx\Phi\left(-z_{{\alpha}}+(2d% )^{-1/2}\sum_{i=1}^{p}\psi_{i}\,\beta^{2}_{i}\right),

(3.11)

where $\Phi(\cdot)$ denotes the standard normal distribution, and $d=\lim_{p\to\infty}\sum_{i=1}^{p}\psi^{2}_{i}/p$ being a positive constant.

Proof.

Under the null hypothesis, we may note that ${\mathcal{E}}T^{2}_{0}=\sum_{i=1}^{p}\psi_{i}$ and ${\mathcal{V}ar}T^{2}_{0}=2\sum_{i=1}^{p}\psi^{2}_{i}$ . Thus by Theorem 2 as $n\to\infty$ we have

	$\displaystyle P\left\{\dfrac{T^{2}_{N}-{\mathcal{E}}T^{2}_{N}}{\sqrt{{\mathcal% {V}ar}T^{2}_{N}}}\geq z_{{\alpha}}\Biggr{\|}H_{0}\right\}$	(3.12)
$\displaystyle\approx{}$	$\displaystyle P\left\{\dfrac{T^{2}_{0}-{\mathcal{E}}T^{2}_{0}}{\sqrt{{\mathcal% {V}ar}T^{2}_{0}}}\geq z_{{\alpha}}\Biggr{\|}H_{0}\right\}$
$\displaystyle={}$	$\displaystyle P\left\{\dfrac{T^{2}_{0}-\sum_{i=1}^{p}\psi_{i}}{\sqrt{2\sum_{i=% 1}^{p}\psi^{2}_{i}}}\geq z_{\alpha}\right\}$
$\displaystyle\quad\approx{}$	$\displaystyle 1-\Phi(z_{{\alpha}})$
$\displaystyle={}$	$\displaystyle\Phi(-z_{{\alpha}})$
$\displaystyle={}$	$\displaystyle\alpha.$

And hence, under the sequence of local alternatives $H_{1n}$ we then have

	$\displaystyle P\left\{\dfrac{T^{2}_{N}-{\mathcal{E}}T^{2}_{N}}{\sqrt{{\mathcal% {V}ar}T^{2}_{N}}}\geq z_{{\alpha}}\Biggr{\|}H_{1n}\right\}$	(3.13)
$\displaystyle\approx{}$	$\displaystyle P\left\{\dfrac{T^{2}_{0}-\sum_{i=1}^{p}\psi_{i}}{\sqrt{2\sum_{i=% 1}^{p}\psi^{2}_{i}}}\geq z_{{\alpha}}\Biggr{\|}H_{1n}\right\}$
$\displaystyle={}$	$\displaystyle P\left\{\dfrac{T^{2}_{0}-(\sum_{i=1}^{p}\psi_{i}+p^{1/2}\sum_{i=% 1}^{p}\psi_{i}\,\beta^{2}_{i})}{\sqrt{2\sum_{i=1}^{p}\psi^{2}_{i}}}\geq z_{{% \alpha}}-\frac{p^{1/2}\sum_{i=1}^{p}\psi_{i}\,\beta^{2}_{i}}{\sqrt{2\sum_{i=1}% ^{p}\psi^{2}_{i}}}\Biggr{\|}H_{1n}\right\}$
$\displaystyle\approx{}$	$\displaystyle\Phi\,\Biggr{(}-z_{\alpha}{\sqrt{\left(1+2\dfrac{p^{1/2}\sum_{i=1% }^{p}\psi^{2}_{i}\beta^{2}_{i}}{\sum_{i=1}^{p}\psi^{2}_{i}}\right)^{-1}}}+% \frac{p^{1/2}\sum_{i=1}^{p}\psi_{i}\,\beta^{2}_{i}}{\sqrt{2{\sum_{i=1}^{p}\psi% ^{2}_{i}+4p^{1/2}\sum_{i=1}^{p}\psi^{2}_{i}\beta^{2}_{i}}}}\Biggr{)}$
$\displaystyle\to{}$	$\displaystyle\Phi\,\Biggr{(}-z_{\alpha}+\frac{p^{1/2}\sum_{i=1}^{p}\psi_{i}\,% \beta^{2}_{i}}{\sqrt{2{\sum_{i=1}^{p}\psi^{2}_{i}+4p^{1/2}\sum_{i=1}^{p}\psi^{% 2}_{i}\beta^{2}_{i}}}}\Biggr{)}$
$\displaystyle={}$	$\displaystyle\Phi\,\Biggr{(}-z_{\alpha}+\dfrac{\sum_{i=1}^{p}\psi_{i}\,\beta^{% 2}_{i}}{\sqrt{2p^{-1}\sum_{i=1}^{p}\psi^{2}_{i}+4p^{-1/2}\sum_{i=1}^{p}\psi^{2% }_{i}\beta^{2}_{i}}}\Biggr{)}$
$\displaystyle\to{}$	$\displaystyle\Phi\,\Biggr{(}-z_{\alpha}+\dfrac{\sum_{i=1}^{p}\psi_{i}\,\beta^{% 2}_{i}}{\sqrt{2d}}\Biggr{)}.$

∎

Generally, the applications of Theorem 3, it needs the consistent estimators of $\psi_{i},~{}i=1,\ldots,p$ . When $\mbox{\boldmath$\Sigma$}={\bf I}$ , then $\psi_{i}=a^{or}_{i}$ , which can be consistently estimated by $\widehat{\phi^{*}_{i,p}}^{-1},i=1,\ldots,p$ . Hence we have the following.

Corollary 1.

Under the assumptions of Theorem 2 and under $\mbox{H}_{0}$ , when $\mbox{\boldmath$\Sigma$}={\bf I}$ , we have that $\dfrac{T^{2}_{N}-\sum_{i=1}^{p}\widehat{\psi}_{i}}{\sqrt{2\sum_{i=1}^{p}% \widehat{\psi_{i}}^{2}}}=\dfrac{T^{2}_{N}-\sum_{i=1}^{p}\widehat{\phi^{*}_{i,p% }}^{-1}}{\sqrt{2\sum_{i=1}^{p}\widehat{\phi^{*}_{i,p}}^{-2}}}\longrightarrow% \mbox{N}(0,1).$

Thus, when $\mbox{\boldmath$\Sigma$}={\bf I}$ the quantity $\left(T^{2}_{N}-\sum_{i=1}^{p}\widehat{\psi}_{i}\right)\bigg{/}\sqrt{2\sum_{i=% 1}^{p}\widehat{\psi_{i}}^{2}}$ is completely data-driven.
Let ${\bf D}={\mbox{\boldmath$\Gamma$}}{\mbox{\boldmath$\Gamma$}}^{*-1}$ , write ${\bf D}$ =diag $(d_{1},\ldots,d_{p})$ . The weight $d_{i}$ is the ratio of $i$ th eigenvalues of two covariance matrices $\Sigma$ and ${\mbox{\boldmath$\Sigma$}}_{1}$ , i.e., $d_{i}=\gamma_{i,p}/\gamma^{*}_{i,p}=\gamma_{i,p}a^{or}_{i},~{}i=1,\dots,p$ . Then we may note that $0<a^{or}_{i}<\infty$ and hence $0<d_{i}<\infty,\forall i=1,\ldots,p$ .
When ${\bf G}={\bf I}$ , (i.e., ${\bf V}_{1}={\bf V}$ ), then $\mbox{\boldmath$\Sigma$}_{2}$ = ${\bf V}{\bf D}{\bf V}^{\top}$ , $\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Gamma$})={\bf D}$ and ${\bf V}_{2}={\bf V}$ . Hence we have the following.

Corollary 2.

For the hypothesis testing problem (1.1), under the assumptions of Theorem 2, if ${\bf G}={\bf I}$ (i.e., ${\bf U}~{}\mbox{converges to }{\bf V}$ a.s., as $n\to\infty$ ), then the asymptotically local power of $T^{2}_{N}$ is $\beta(\mbox{\boldmath$\mu$})\approx\Phi\left(-z_{{\alpha}}+(2d)^{-1/2}\sum_{i=% 1}^{p}d_{i}\,\beta^{2}_{i}\right)=\Phi\left(-z_{{\alpha}}+(2d)^{-1/2}{\mbox{% \boldmath$\delta$}}^{\top}{\mbox{\boldmath$\Sigma$}^{-1}_{1}}{\mbox{\boldmath$% \delta$}}\right).$

The statistic $T^{2}_{N}$ asymptotically reduces to non-central chi-square distributed when $d_{i}=1,~{}\forall i=1,\ldots,p$ , (i.e., $\mbox{\boldmath$\Sigma$}_{1}=\mbox{\boldmath$\Sigma$}$ ).

Corollary 3.

For the hypothesis testing problem (1.1), under the assumptions of Theorem 2, if ${\bf D}={\bf I}$ , then the proposed $T^{2}_{N}$ -test is asymptotically optimal.

Remark 1 For one thing, if $d=\infty$ , then the asymptotic power of $T^{2}_{0}$ is equal to the significant level $\alpha$ . And for another, as ${\mbox{\boldmath$\Gamma$}}={\mbox{\boldmath$\Gamma$}}^{*}$ the diagonal matrix ${\bf D}$ equals to ${\bf I}$ , i.e., ${\mbox{\boldmath$\Sigma$}}_{1}={\mbox{\boldmath$\Sigma$}}$ . If ${\bf D}={\bf I}$ (i.e., $d=1$ ) the proposed test $T^{2}_{N}$ has the asymptotically optimal power property. Moreover, we may note that the asymptotic distribution of the optimal test statistic is non-central $\chi^{2}$ distributed. As a result, the key point to obtain the asymptotically optimal Hotelling’s type test is to use the consistent estimator of $\Sigma$ . Note that we assume that $n>p$ , then $\bf S$ is the MLE of $\Sigma$ . And hence ${\bf S}\to\mbox{\boldmath$\Sigma$}$ a.s. when $p$ is fixed and $n\to\infty$ (i.e., $c=0$ ). Thus Hotelling’s $T^{2}$ -test statistic in (1.3) converges to $n\overline{\bf X}^{\top}\mbox{\boldmath$\Sigma$}^{-1}\overline{\bf X}$ in probability as $n\to\infty$ . However, it may not be true when $c\in(0,1)$ due to the inconsistency of sample eigenvalues. By Theorem 3 and Remark 1, we have the following

Corollary 4.

For the hypothesis testing problem (1.1), the Hotelling’s $T^{2}$ -test is asymptotically optimal when $c=0$ . However, it is not asymptotically optimal when $c\in(0,1)$ .

Remark 2 Generally $\mbox{\boldmath$\Sigma$}_{1}\neq{\mbox{\boldmath$\Sigma$}}$ , by Corollary 3 the decomposite test statistic $T^{2}_{N}$ , which is based on the optimal orthogonally equivariant estimator $\widehat{\mbox{\boldmath$\Sigma$}}^{-1}_{LW}$ for the precision matrix ${\mbox{\boldmath$\Sigma$}}^{-1}$ , will not be asymptotically optimal for the hypothesis testing problem (1.1). Namely, all the regularized Hotelling’s $T^{2}$ type tests are not asymptotically optimal due to the sample eigenvalues inconsistency. As such, to obtain the asymptotically optimal test for the hypothesis testing problem (1.1) without having the structure assumption of covariance, it is necessary to do more modification work with the eigenvalue and eigenvector estimation of population covariance matrix $\Sigma$ . Namely, to find out the consistent estimator of $\Sigma$ when $c\in(0,1)$ is a quite hard work. It remains wide open in the literature. Remark 3 Usually, for the fixed dimensional cases there is no any restriction on the unknown nuisance parameter $\Sigma$ to establish the asymptotic normality of the test statistics. However, for the large dimensional p cases, the asymptotic normality of the test statistics holds either under some restrictions on the unknown nuisance parameter $\Sigma$ or the case that proposed test is optimal when ${\bf D}={\bf I}$ . As such, the numerical powers of tests under large dimension situation are not comparable. Because we can only perform those numerical power functions under restricted parameter spaces of $\Sigma$ , where the asymptotic normality of test statistics holds. Those restriction spaces of $\Sigma$ over spaces $\{\mbox{\boldmath$\Sigma$}|\mbox{\boldmath$\Sigma$}>{\bf 0},\mbox{tr}({\mbox{% \boldmath$\Sigma$}}^{4})/\mbox{tr}^{2}({\mbox{\boldmath$\Sigma$}}^{2})=o(1)\}$ and $\{\mbox{\boldmath$\Sigma$}|\mbox{\boldmath$\Sigma$}>{\bf 0},\mbox{tr}({\mbox{% \boldmath$\Gamma$}}^{4}_{K})=o(\mbox{tr}^{2}({\mbox{\boldmath$\Gamma$}}^{2}_{K% }))\}$ Feng et al. [10], are generally hard to be analytically characterized. Each test may have different restricted parameter space to ensure the asymptotic normality of test statistic. Besides, there is no clear way to compare the power functions for those tests beyond restricted spaces. To overcome the difficulty, we provide a testing procedure under the local alternative which the dimensionality p is also taken into the consideration. This generalize the fixed dimensional situations into the large dimensional cases. Our proposed test statistics $T^{2}_{N}$ dose not encounter such a disaster mentioned above, as we have discussed in Corollary 3, the optimal convergence estimator of $\Sigma$ will lead the corresponding test to be optimal. Thus, to compare the tests for hypothesis testing problem (1.1), it is essential to compare the estimators of $\Sigma$ . Random matrix theory will play an important role in obtaining reasonable estimators of population covariance matrix. We will explain this point more clearly through comparisons with the existing tests in Section 4.

4 The comparison of tests

4.1 The asymptotic relative efficiency

A standard method to compare asymptotic power functions is through asymptotic relative efficiency (ARE) (Pitman [22]), which is essentially defined via large deviation asymptotics. It is well known that the Sanov theorem and its generalizations reduce the problem of large deviations to a minimization problem of Kullback-Leibler divergence on the corresponding set of distributions. For any two test statistics which are asymptotic to normal, i.e., $\chi^{2}$ distributed with noncentralities $\mu^{{}^{\prime}}{\bf A}{\mu}$ and $\mu^{{}^{\prime}}{\bf B}{\mu}$ , respectively. Then the ARE of these two tests is equivalent to $\mu^{{}^{\prime}}{\bf A}{\mu}/\mu^{{}^{\prime}}{\bf B}{\mu}$ . Whenever the value of ARE of test $T_{a}$ relative to test $T_{b}$ is larger than one, then the procedure based on $T_{a}$ is considered to have larger asymptotic power than that of the competing test based on $T_{b}$ . The test $T_{a}$ has the better asymptotic power than that of test $T_{b}$ if the eigenmatrix of ${\bf A}{\bf B}^{-1}$ is larger than ${\bf I}$ . Following the arguments as in Case 1, we can easily see that the tests proposed by Dempster [8], [9], Bai and Saranadasa [2], Srivastava and Du [23], Srivastava [24], Chen and Qin [6], Chen et al. [5], Park and Ayyala [21] and Feng et al. [10] are not optimal for the hypothesis testing problem (1.1) when the dimension is large. Basically, these results can be classified into the following three categories: Case 1. Compare the ARE of tests constructed without using the information of correlations. Let ${\bf B}^{-1}{\bf A}=\left[\mbox{tr}(\mbox{\boldmath$\Sigma$}^{2})\right]^{1/2}% \textbf{I}\mbox{\boldmath$\Sigma$}^{-1}=\sqrt{\mbox{tr}(\mbox{\boldmath$\Sigma% $}^{2})}\mbox{\boldmath$\Sigma$}^{-1}.$ Thus the eigenmatrix of ${\bf B}^{-1}{\bf A}$ is larger than ${\bf I}$ . Thus we may conclude that the tests proposed by Dempster [8],[9], Bai and Saranadasa [2] are not optimal. Similar arguments by taking ${\bf B}=[\mbox{tr}({\bf R}^{2})]^{-1/2}{\bf D}^{-1}_{0}$ , where ${\mbox{\boldmath$\Sigma$}}={\bf D}^{\frac{1}{2}}_{0}{\bf R}{\bf D}^{\frac{1}{2% }}_{0}$ with ${\bf D}_{0}=\mbox{diag}(\sigma_{11},\ldots,\sigma_{pp})$ , we may also conclude that tests used the information of diagonal elements of ${\bf S}$ , such as Srivastava and Du [23], Srivastava [24], Chen and Qin [6], Park and Ayyala [21] are not optimal neither. Case 2. Compare the tests constructed by using some correlations for the estimation of covariance matrix. Feng et al. [10] followed Bai and Saranadasa’s model assumptions and improved the works of Chen and Qin [6], Park and Ayyala [21] by adding correlations into consideration. They divided the $p$ variables into several small parts for invertible covariance matrix and then added those corresponding Hotelling $T^{2}$ -test statistics up, which is called the composite $T^{2}$ test. The asymptotic power function of the composite $T^{2}$ test is of the form

\displaystyle~{}\beta_{CT}(\mbox{\boldmath$\mu$})\approx\Phi\left(-z_{\alpha}+% \dfrac{n\mbox{\boldmath$\mu$}^{\top}\mbox{\boldmath$\Sigma$}^{-1}_{\mathcal{O}% ^{K}}\mbox{\boldmath$\mu$}}{\sqrt{2\mbox{tr}({\bf\Gamma}^{2}_{K})}}\right),

(4.1)

where ${\bf\Gamma}_{K}=\mbox{\boldmath$\Sigma$}^{1/2}\mbox{\boldmath$\Sigma$}^{-1}_{% \mathcal{O}^{K}}\mbox{\boldmath$\Sigma$}^{1/2}$ and $\mathcal{O}^{K}=\{A^{0}_{1},\ldots,A^{0}_{N}\}$ , for the details see Feng et al. [10] (p.1423). To avoid the asymptotic power always being one as $p\to\infty$ , some further conditions are needed. Note that under their assumption (C3): $\|{\mbox{\boldmath$\mu$}}\|^{2}=O(n^{-1}p^{1/2})$ , then equation (4.1) can be further reduced to that $\beta_{CT}(\mbox{\boldmath$\mu$})\approx\Phi(-z_{\alpha}+\frac{p^{1/2}{\mbox{% \boldmath$\delta$}^{\top}\mbox{\boldmath$\Sigma$}^{-1}_{\mathcal{O}^{K}}\mbox{% \boldmath$\delta$}}}{{\sqrt{2\mbox{tr}({\bf\Gamma}^{2}_{K})}}})$ . We may see that the asymptotic power function of composite test becomes $\beta_{CT}(\mbox{\boldmath$\mu$})=\Phi(-z_{\alpha}+(2d_{1})^{-1/2}{\mbox{% \boldmath$\delta$}^{\top}\mbox{\boldmath$\Sigma$}^{-1}_{\mathcal{O}^{K}}\mbox{% \boldmath$\delta$}})$ if $\lim_{p\to\infty}\mbox{tr}({\mbox{\boldmath$\Gamma$}}^{2}_{K})/p=d_{1}$ holds. But, note that $\mbox{\boldmath$\Sigma$}^{-1}_{\mathcal{O}^{K}}$ will not be equal to $\mbox{\boldmath$\Sigma$}^{-1}$ generally. Feng et al. [10] basically made some assumptions on the covariance matrix so that the estimator of covariance matrix having the block diagonal type matrix, thus we may concern that the information may be lost in general. Theorem 3 tells us that the composite $T^{2}$ test of Feng et al. [10] is not optimal unless that ${\mbox{\boldmath$\Gamma$}}^{2}_{K}={\bf I}$ , i.e., $\mbox{\boldmath$\Sigma$}^{-1}_{\mathcal{O}^{K}}=\mbox{\boldmath$\Sigma$}^{-1}$ , which will not happen in their setup. Again, as in Case 1, we may conclude that there still exists room to develop test of more robust and powerful. Case 3. Compare the tests constructed by adopting the ridge regression type covariance estimator. Chen et al. [5] imposed some regularizations on the sample covariance matrix and proposed a regularized Hotelling’s $T^{2}$ statistic (RHT)

RHT(\lambda)=n\bar{\bf X}^{\top}\left({\bf S}+\lambda{\bf I}_{p}\right)^{-1}% \bar{\bf X},

(4.2)

where $\lambda>0$ . Note that the RHT statistic $n\bar{\bf X}^{\top}({\bf S}+\lambda{\bf I}_{p})^{-1}\bar{\bf X}=n\bar{\bf X}^{% \top}{\bf U}(\mbox{\boldmath$\Lambda$}+\lambda{\bf I})^{-1}{\bf U}^{\top}\bar{% \bf X}$ , which has the similar form as that of the decomposite $T^{2}_{N}$ -test statistic. Note that $\lambda_{i,p}+\lambda$ is linear and $\lambda$ needs to be estimated. This is related to the Stein type shrinkage estimators. Their estimators of population eigenvalues may not be optimal. Ledoit and Wolf [13] studied the best linear estimator of the form $a\lambda_{i,p}+b,a,b>0,a+b=1$ . Ledoit and Wolf [14] further claimed that the nonlinear estimators $\hat{a}^{or}_{i}$ are better than those of the best linear estimators $a\lambda_{i,p}+b,\forall i=1,\ldots,p$ . It remains room to improve the estimators of eigenvalues. Ledoit and Péché [12] used the random matrix theory to claim that their nonlinear shrinkage eigenvalues estimator of the precision matrix ${\mbox{\boldmath$\Sigma$}}^{-1}$ is optimal. As noted in above, the ARE is based on the quantity of Kullback-Leibler divergence, and the Stein loss function is proportional to the Kullback-Leibler divergence under the multivariate normal setup. As such, the optimal orthogonally equivariance estimator corresponds to the optimal power test. Among the class of orthogonally equivariant estimators, the decomposite $T^{2}_{N}$ test statistic digs out the optimal information of eigenvalues of the precision matrix. Ledoit and Wolf [16] expected that their estimator in (2.8) to be close to the inverse population matrix (precision matrix), and at the same time its inverse can also be close to the population covariance matrix. In comparisons with the tests mentioned above, the decomposite $T^{2}_{N}$ -test is different from them. We may expect that the decomposite $T^{2}_{N}$ -test may perform better than both the RHT proposed by Chen et al. [5] and the composite test proposed by Feng et al. [10]. It is easy to note that the sample eigenvalues are not independent. One of our main goals is to fulfill the hope that more information of population eigenvalues can be digged out via the help of dedicated random matrix theory.

4.2 Numerical power comparisons

Via Corollary 2, it is easy to see that the composite $T^{2}$ -test of Feng et al. [10] has a similar form of asymptotically local power function as that of the proposed decomposite $T^{2}_{N}$ -test. Define the quantity $\dfrac{\mbox{\boldmath$\delta$}^{\top}\mbox{\boldmath$\Sigma$}^{-1}_{1}\mbox{% \boldmath$\delta$}}{\sqrt{2d}}\bigg{/}\dfrac{\mbox{\boldmath$\delta$}^{\top}% \mbox{\boldmath$\Sigma$}^{-1}_{O^{\mathcal{K}}}\mbox{\boldmath$\delta$}}{\sqrt% {2d_{1}}}$ as the ARE of the decomposite $T^{2}_{N}$ -test with respect to the composite $T^{2}$ -test. Note that, if the value of ARE is larger than 1, then the decomposite $T^{2}_{N}$ -test has greater power than that of the composite $T^{2}$ -test. We make some simulation studies of power comparisons and AREs for the decomposite $T^{2}_{N}$ -test and the composite $T^{2}$ -test based on the intraclass correlation model. Namely, $\mbox{\boldmath$\Sigma$}=(\sigma_{ij})$ , where $\sigma_{ij}=\rho^{|i-j|},~{}i=1,\ldots,p,~{}j=1\ldots,p$ ; $\rho\in(-1,1),\rho\neq 0$ . Without loss of generality, we take $c=1/3$ , the significance level $\alpha=0.05$ and $\mbox{\boldmath$\delta$}=(\delta_{1},\ldots,\delta_{p})$ with $\delta_{i}\in(-1,,1).~{}i=1,\ldots,p$ . When $p=20$ , take $K=2$ , while $p=40$ , take $K=4$ in Table 1.

Table 1: Power comparison and ARE.

	$p=20,K=2$			$p=40,K=4$
$\rho$	The decomposite	The composite	ARE	The decomposite	The composite	ARE
	$T^{2}_{N}-test$	$T^{2}$ -test		$T^{2}_{N}-test$	$T^{2}$ -test
-0.2	$0.3695$	$0.3480$	$1.0366$	$0.9615$	$0.9418$	$1.0560$
0.2	$0.7054$	$0.5893$	$1.1438$	$0.9025$	$0.8730$	$1.0501$
-0.5	$0.9073$	$0.6952$	$1.3293$	$0.9811$	$0.6447$	$1.7318$
0.5	$0.8251$	$0.2859$	$2.0758$	$0.9968$	$0.8162$	$1.6397$
-0.8	$0.9150$	$0.1878$	$3.1023$	$1.0000$	$0.6865$	$5.7759$
0.8	$0.8848$	$0.0466$	$6.1533$	$1.0000$	$0.9759$	$5.4044$

5 Real data analysis

More than two decades have passed since the founding of the Taipei Rapid Transit Corporation (TRTC) in 1994. Entering the 2.0 era, the Metro system is complete and is time for further expansion. A multi-point transferring model relieves congestion and disperses the current burden of existing transfer stations, therefore, providing the public with speedier and better transportation performance and quality.
In order to test whether there is a significant growth in population of public transportation, especially commuters mainly take Taipei Metro System in recently years, we use data gathered from 1 July, 2015 to 30 April, 2020, including 108 stations’ exit ridership on record. Since lacking the acknowledgment of distribution of $T^{2}_{N}$ , we use bootstrap method to conduct the one sample testing problem with significant level $\alpha=0.01$ .

5.1 The Bootstrap procedures for calculating $T^{2}_{N}$ are as follow:

1.

Calculate column mean vector $\overline{\bf{X}}$ and sample covariance matrix ${\bf S}$ of data set before resampling. Decomposite ${\bf S}$ into sample eigenvalues $\lambda_{i,p},i=1,\ldots,p$ and its corresponding eigenvectors $u_{i},i=1,\ldots,p$ .
2.

Calculate ${\widehat{}\mbox{\boldmath$\Sigma$}}^{-1}_{LW}$ provided by Ledoit and Wolf [14] by using their algorithm of numerical implementation, the QuEST function in Ledoit and Wolf [15].
3.

Then $T^{2}_{N}$ can be acquired as $T^{2}_{N}=n{\overline{\bf{X}}}^{\top}{\widehat{}\mbox{\boldmath$\Sigma$}}^{-1}% _{LW}{\overline{\bf{X}}}$ .
4.

Repeated random sample $95\%$ of the days from original data set with replacement, record the subset data each time.
5.

Calculate sample covariance matrix ${\bf S}$ and the corresponding $T^{2}_{N}$ for each collect data set.

After building up a sampling distribution by computing $T^{2}_{N}$ from 1000 times simulated data under the null hypothesis, we compare the test statistic before resampling to the sampling distribution. The empirical p-value is the proportion in the sampling distribution that are as extreme as the test statistics.
We want to test whether there is a difference in mean ridership among stations under the following two cases. Let $\mbox{\boldmath$\mu$}_{0}=(\mu_{1,0},\ldots,\mu_{108,0})^{\top}$ be the exit ridership mean vector of 108 stations, which is calculated from the second half year from July to December of 2015 as a comparison bench mark for mean testing. And our parameters to test, the exit ridership mean vector of 108 stations is denoted as $\mbox{\boldmath$\mu$}=\left(\mu_{1},\ldots,\mu_{108}\right)^{\top}$ .

5.2 The effect of Monthly Unlimited Transport Policy

In 2018, Mayors of Taipei and New Taipei City announced a new unlimited public transportation card, called the ”All Pass Ticket”, and is priced at NT $\$$ 1,280 $(\mbox{US}\$43.83)$ a month. It is released on April 16, 2018, and it is a periodical commuter ticket. It is valid for both buses and the Taipei Metro, and also for the first 30 minutes of a YouBike ride. Commuters across Taipei and New Taipei City are sure to benefit from the policy. Paying NT $\$$ 1,280 for 30 days unlimited rides works out to an average cost of NT $\$$ 42 per day. Taipei Mayor Ke Wen-je said that as always, people are encouraged to use public transportation to help combat traffic congestion. On the other hand, New Taipei City Mayor Eric Chu said he hoped the new pass can help boost daily ridership in Taipei’s public transportation system (March 12, 2018. Central News Agency). Hypothesis testing problem of interest is:

\displaystyle~{}H_{0}:{\mbox{\boldmath$\mu$}_{1280}}={\mbox{\boldmath$\mu$}}_{% 0}~{}\mbox{versus}~{}H_{1}:{\mbox{\boldmath$\mu$}_{1280}}\neq{\mbox{\boldmath$% \mu$}}_{0}\;,

(5.1)

where $\mbox{\boldmath$\mu$}_{1280}$ is the mean vector of stations during the period of policy, and $\mbox{\boldmath$\mu$}_{0}$ is the mean vector as defined before.
Here we check the effectiveness of the “All Pass Ticket” policy by bootstrap resampling process based on days from 2017 till 2019 to calculate the test statistic $T^{2}_{N}$ , and the Hotelling’s $T^{2}$ -test statistic shows that there are $4$ shuffled statistics out of 1000 less than the value, which is $5292.244$ , of $T^{2}_{N}$ -test statistic for the real data set. No matter what the significance level $\alpha$ is either 0.01 or 0.05, the empirical p-value is equal to $0.004$ which is less than $\alpha$ . The value $5292.244$ is also less than $0.5\%$ quantile of the sampling distribution of value $5302.149$ . Meanwhile, the result of using Hotelling’s $T^{2}$ -test while with empirical p-value being equal to $0$ . It seems that there is a significant difference of the mean values in the aspect of exit ridership of each station during the monthly unlimited public transport card policy.
For this real data set, by both the decomposite $T^{2}_{N}$ -test with empirical p-value 0.04 and Hotelling’s $T^{2}$ -test with empirical p-value 0, H ${}_{0}$ in (5.1) is rejected when the level of significance $\alpha$ is either 0.01 or 0.05. Note that no matter how small the significance level $\alpha$ is, H ${}_{0}$ is strongly rejected by Hotelling’s T ${}^{2}$ -test, with empirical p-value 0. This indicates that the decomposite $T^{2}_{N}$ -test has the advantage over Hotelling’s T ${}^{2}$ -test by the bootstrap procedure in the analysis of this real data set.

6 Conclusion and Future Study

It is generally hard to compare tests well based on a single index, for there are so many nuisance parameters when the dimension is large. Some other statistical aspects are also needed to be incorporated into consideration for the comparison of tests. $T^{2}_{N}$ defined in (3.1) is constructed by the use of optimal estimators of eigenvalues of the precision matrix as pointed out by Ledoit and Wolf [16]. For there were no much work using these results from the data analysis point of views in the literature, we adopt the permutation test based on good test statistics which may be easy to perform and be robust in practice. Based on the discussions above, it seems reasonable to adopt the decomposite $T^{2}_{N}$ statistic to perform the bootstrap procedure for analyzing large dimensional data sets.
The rotation equivariance property is quite appropriate in the general situation where one has no prior information about the orientation of the eigenvectors of population covariance matrix. However, without having the consistent estimators of population eigenvalues matrix $\Gamma$ , it is still difficult to perform the test statistic precisely well even under the null hypothesis $H_{0}$ . Those tests incorporated with the information of ${\bf S}$ existing in the literature also face the same difficulty, such as the estimation of ${\mbox{\boldmath$\Gamma$}}^{2}_{K}$ Feng et al. [10]. One of the main goals of this work is to find out more information about population eigenvalues with the help of delicate random matrix theory. As we may note that the joint density function of those dependent eigenvalues is well known for the Wishart ensemble, and it is given by the Marčenko-Pastur distribution for a system with large dimension when $c\in(0,1)$ . So the statistical significance of the correlations in the large system can be obtained from the empirical eigenvalue spectrum distribution of the sample covariance matrix via the Marčenko-Pastur distribution. This is one of the main advantages of the approach to obtain the consistent eigenvalues and eigenvectors of population counterparts. If the matrix ${\bf D}$ is equal to the identity matrix, then our proposed $T_{N}^{2}$ -test will be optimal for the hypothesis testing problem (eq:1.1). In this ideal situation, by Corollary 3 we then base on the normalized test statistic $(T^{2}_{N}-p)/\sqrt{2p}$ and usual normal theory to do the work of data analysis. However, this study indicates that both the Stein’s estimator (2.1) and the Ledoit and Wolf’s estimator (2.8) are not the consistent estimators of $\Sigma$ . For the application of principle of analysis, we may remark that it is still open to find out the consistent estimators of population eigenvalues and eigenvectors of $\Sigma$ in the large dimensional system. At this stage, it may be too optimistic to expect the whole information of $\Sigma$ can be revealed without any a prior knowledge in its structure. Hence, we put this difficult but important problem as a future study.

Acknowledgments

The author would like to thank Professors Z.R. Chen and H.N. Hong from National Chiao Tung University, and Professor S.Y. Huang from Academia Sinica for their helpful discussions.

References

1.

T.W. Anderson (2003), An Introduction to Multivariate Statistical Analysis. 3rd edition. Wiley, New York.
2.

Bai and Saranadasa (1996), Effect of High Dimension: by an Example of a Two Sample Problem. Statistica Sinica, Vol. 6, No.2, 311–329.
3.

Bai, Z.D. and Miao, B.Q. and Yao, J.F. (2003), Convergence rates of spectral distributions of large sample covariance matrices. SIAM J. Matrix Anal. Appl., Vol.25, 105–127.
4.

Bai, Z.D. and Miao, B.Q. and Pan, G.M. (2007), On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab., Vol.35, 1532–1572.
5.

Chen, L.S. and Paul, D. and Prentice, R.L. and Wang, P. (2011), A Regularized Hotelling’s ${T}^{2}$ -Test for Pathway Analysis in Proteomic Studies. J. Am. Stat. Assoc., Vol.106, No.496, 1345–1360.
6.

Chen, S.X. and Qin, Y.L. (2010), A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist., Vol.38(2), 808–835.
7.

Choi, S.I. and Silverstein, J.W. (1995), Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal., Vol.54(2), 295–309.
8.

Dempster, A.P. (1958), A high dimensional two sample significance test. Ann. Math. Statist., Vol.29(4), 995–1010.
9.

Dempster, A.P. (1960), A significance test for the separation of two highly multivariate small samples. Biometrics, Vol.16(1),41–50.
10.

Feng, L. and Zou, C. and Wang, Z. and Zhu, L. (2017), Composite ${T}^{2}$ test for high-dimensional data. Statistica Sinica, Vol.27(3), 1419–1436.
11.

Johnstone, I. M. and Paul, D. (2018), PCA in High Dimensions: An Orientation. Proc. IEEE, Vol.106(8), 1277–1292.
12.

Ledoit, O. and Péché, S. (2011), Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Relat. Fields, Vol.151, 233–264.
13.

Ledoit, O. and Wolf, M. (2004), A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal., Vol.88, 365–411.
14.

Ledoit, O. and Wolf, M. (2012), Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist., Vol.40(2), 1024–1060.
15.

Ledoit, O. and Wolf, M. (2017), Numerical implementation of the QuEST function. Computational Statistics and Data Analysis, Vol.115, 199–223.
16.

Ledoit, O. and Wolf, M. (2018), Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. Bernoulli, Vol.24(4B), 3791–3832.
17.

Li, H. and Aue, A. and Paul, D. and Peng, J. and Wang, P. (2020), An adaptable generalization of Hotelling’s $T^{2}$ -test in high dimension. Ann. Statist., Vol.48(3), 1815 – 1847.
18.

Marčenko, V.A. and Pastur, L.A. (1967), Distribution of eigenvalues for some sets of random. Sb. Math., Vol.1, 457–483.
19.

Muirhead, R.J. (1982), Aspects of Multivariate Statistical Theory. Wiley, New York.
20.

Pan, G.M. and Zhou, W. (2011), Central limit theorem for Hotelling’s ${T}^{2}$ statistic under large dimension. Ann. Appl. Probab., Vol.21, 1860–1910.
21.

Park, J. and Ayyala, D.N. (2013), A test for the mean vector in large dimension and small samples. J. Statist. Plan. Infer., Vol.143(5), 929–943.
22.

Pitman, E.J.G. (1948), Lecture Notes on Nonparametric Statistical Inference: Lectures Given for the University of North Carolina. University of North Carolina.
23.

Srivastava, M.S. and Du, M. (2008), A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal., Vol.99(3), 386–402.
24.

Srivastava, M.S. (2009), A test for the mean vector with fewer observations than the dimension under non-normality. J. Multivariate Anal., Vol.100(3), 518–532.
25.

Silverstein, J.W. (1995), Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivariate Anal., Vol.55(2), 331-339.
26.

Stein, C. (1975), Estimation of a covariance matrix. Rietz lecture, 39th Annual Meeting IMS.
27.

Stein, C. (1986), Lectures on the theory of estimation of many parameters. J. Math. Sci, Vol.34, 1373–1403.

Institute of Statistical Science, Academia Sinica, Taipei. E-mail: [email protected]