Asymptotics of estimators for structured covariance matrices

Hendrik Paul Lopuhaä Delft University of Technology

(July 2, 2024)

Abstract

We show that the limiting variance of a sequence of estimators for a structured covariance matrix has a general form that appears as the variance of a scaled projection of a random matrix that is of radial type and a similar result is obtained for the corresponding sequence of estimators for the vector of variance components. These results are illustrated by the limiting behavior of estimators for a linear covariance structure in a variety of multivariate statistical models. We also derive a characterization for the influence function of corresponding functionals. Furthermore, we derive the limiting distribution and influence function of scale invariant map**s of such estimators and their corresponding functionals. As a consequence, the asymptotic relative efficiency of different estimators for the shape component of a structured covariance matrix can be compared by means of a single scalar and the gross error sensitivity of the corresponding influence functions can be compared by means of a single index. Similar results are obtained for estimators of the normalized vector of variance components. We apply our results to investigate how the efficiency, gross error sensitivity, and breakdown point of S-estimators for the normalized variance components are affected simultaneously by varying their cutoff value.

1 Introduction

Covariance matrices describe the relationships and variability between different variables in a dataset. When there is a known structure or pattern in these relationships, structured covariance matrices can be estimated to capture and represent that structure. The use of structured covariance matrices is a valuable tool for modeling the underlying patterns and dependencies in multivariate data. It provides a more nuanced understanding of the relationships between variables, especially in scenarios where variables exhibit specific structures or patterns of correlation. Structured covariance matrices are commonly used in the analysis of repeated measures, longitudinal data, and multivariate data with a known underlying structure. They are particularly useful when there are dependencies or correlations among different measurements or variables and are widely used in various fields, including biology, medicine, psychology, and social sciences.

When a covariance matrix is unstructured and can be any positive definite symmetric matrix $\mathbf{\Sigma}$ , then the limiting behavior of covariance estimators $\mathbf{V}_{n}$ for $\mathbf{\Sigma}$ is well understood. For example, if $\mathbf{V}_{n}$ is based on a sample $\mathbf{y}_{1},\ldots,\mathbf{y}_{n}\in\mathbb{R}^{k}$ from a distribution with an elliptically contoured density $|\mathbf{\Sigma}|^{-1/2}g((\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(% \mathbf{y}-\bm{\mu}))$ , then typically $\sqrt{n}(\mathbf{V}_{n}-\mathbf{\Sigma})$ converges in distribution to a random matrix $\mathbf{N}$ that has a multivariate normal distribution with mean zero and variance

\text{var}\{\text{vec}(\mathbf{N})\}=\sigma_{1}(\mathbf{I}_{k^{2}}+\mathbf{K}_% {k,k})(\mathbf{\Sigma}\otimes\mathbf{\Sigma})+\sigma_{2}\text{vec}(\mathbf{% \Sigma})\text{vec}(\mathbf{\Sigma})^{T},

(1.1)

for some $\sigma_{1}\geq 0$ and $\sigma_{2}\geq-2\sigma_{1}/k$ , where $\otimes$ denotes the Kronecker product, $\mathbf{K}_{k,k}$ is the commutation matrix, and vec is the operator that stacks the columns of a matrix. This form of limiting variance appears for many covariance estimators. Tyler [26] gives several examples, including the sample covariance matrix, and nicely explains that this general form will always appear when $\mathbf{N}$ is of radial type with respect to $\mathbf{\Sigma}$ .

The situation becomes different, when estimating a structured covariance matrix $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ , where $\mathbf{V}(\cdot)$ is a known covariance structure depending on a vector $\bm{\theta}=(\theta_{1},\ldots,\theta_{\ell})$ of unknown variance components. Asymptotic results for the maximum likelihood estimator of variance components in linear models with Gaussian errors having a structured covariance matrix $\mathbf{V}(\bm{\theta})$ , can be found in Hartley and Rao [8], Miller [22], and Mardia and Marshall [20]. When scaled appropriately, the maximum likelihood estimator $\bm{\theta}_{n}$ is shown to be asymptotically normal with mean $\bm{\theta}$ and variance $\mathbf{J}^{-1}$ , where $\mathbf{J}_{ij}=\text{tr}(\mathbf{\Sigma}^{-1}\mathbf{L}_{i}\mathbf{\Sigma}^{-% 1}\mathbf{L}_{j})/2$ , for $i,j,=1,\ldots,\ell$ , with $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ and $\mathbf{L}_{i}=\partial\mathbf{V}(\bm{\theta})/\partial\theta_{i}$ . By employing the vec-notation, the limiting covariance of $\bm{\theta}_{n}$ can be expressed as

2\left(\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{% L}\right)^{-1},

where $\mathbf{L}$ is the matrix with columns $\mathrm{vec}(\mathbf{L}_{1}),\ldots,\mathrm{vec}(\mathbf{L}_{\ell})$ . According to the delta method the limiting covariance of $\mathrm{vec}(\mathbf{V}(\bm{\theta}_{n}))$ is then given by

2\mathbf{L}\left(\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1% })\mathbf{L}\right)^{-1}\mathbf{L}^{T}.

Similar results have been obtained in Lopuhaä et al [16] for the class of S-estimators based on observations that follow a linear model with a structured covariance $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ , where $\mathbf{V}$ is a linear function of $\bm{\theta}$ . Under appropriate conditions, it holds that $\sqrt{n}(\bm{\theta}_{n}-\bm{\theta})$ is asymptotically normal with mean zero and variance

2\sigma_{1}\Big{(}\mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{% \Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}+\sigma_{2}\bm{\theta}\bm{\theta}^{T},

(1.2)

and $\sqrt{n}(\mathbf{V}(\bm{\theta}_{n})-\mathbf{\Sigma})$ converges in distribution to a random matrix $\mathbf{M}$ , that has a multivariate normal distribution with mean zero and variance

\text{var}\{\text{vec}(\mathbf{M})\}=2\sigma_{1}\mathbf{L}\Big{(}\mathbf{L}^{T% }\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}% ^{-1}\mathbf{L}^{T}+\sigma_{2}\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(% \mathbf{\Sigma})^{T}.

(1.3)

One of the objective of this paper is to show that this general form will always appear when $\mathbf{M}$ is a scaled projection on the column space of $\mathbf{L}$ , of a random matrix that is of radial type with respect to $\mathbf{\Sigma}$ . Moreover, we provide several examples of covariance estimators that exhibit this kind of limiting behavior.

Another objective concerns the asymptotic behavior of estimators for scale invariant map**s $H$ of positive definite symmetric matrices. For affine equivariant covariance estimators $\mathbf{V}_{n}$ with asymptotic variance (1.1), Tyler [27] shows that $H(\mathbf{V}_{n})$ has an asymptotic variance that only depends on the scalar $\sigma_{1}$ . When dealing with a structured covariance matrix, the covariance estimators are typically not affine equivariant and have asymptotic variance (1.3). The second objective of this paper is to show that Tyler’s result for affine equivariant covariance estimators, remains true for estimators of a structured covariance matrix. Moreover, we will establish a similar result for scale invariant map**s $H(\bm{\theta}_{n})$ of estimators for the vector of variance components.

An example of a scale invariant map** is the shape component $\mathbf{V}/|\mathbf{V}|^{1/k}$ . A consequence of our results is that the asymptotic relative efficiency of estimators of the shape of a structured covariance can be compared simply by comparing the corresponding values for $\sigma_{1}$ . For affine equivariant covariance estimators, this was already observed by Kent and Tyler [11] and Salibián et al [24]. Similar properties will be shown to hold for the direction component $\bm{\theta}/\|\bm{\theta}\|$ corresponding to the vector of variance components.

A final objective of this paper concerns the influence function of structured covariance functionals. For affine equivariant covariance functionals, Croux and Haesbroeck [5] show that the influence function at the multivariate normal is characterized by two real-valued functions. Structured covariance functionals, however, are not necessarily affine equivariant. We will show that such a characterization remains valid for structured covariance functionals at any elliptically contoured distribution, and similarly for the variance components functional. A nice consequence is that the influence function of scale invariant map**s $H$ of a structured covariance functional $\mathbf{V}(\bm{\theta}(\cdot))$ or of $\bm{\theta}(\cdot)$ itself, is characterized by a single real-valued function. As such the gross-error-sensitivity (GES) is proportional to a single index, which can be used to compare the GES of different shape functionals or different direction functionals. Kent and Tyler [11] already observed such a property for the shape component of affine equivariant covariance functionals, see also Salibián et al [24].

Except that our results have a merit of their own, they also enable the construction of MM-estimators with auxiliary scale in linear mixed effects models and other linear models with structured covariances. These estimators inherit the robustness of S-estimators considered in Lopuhaä et al [16] and, in contrast to the simpler version considered in Lopuhaä [15], improve both the efficiency of the estimator of the fixed effects as well as the efficiency of the estimator of the covariance shape component and of the direction of the vector of variance components. Investigation of this version of MM-estimators will be postponed to a future manuscript, in which we will extend similar results that are already available for unstructured covariances in the multivariate location-scale model, see Tatsuoka and Tyler [25] or Salibián-Barrera et al [24], and in the multivariate regression model, see Kudraszow and Maronna [12].

The paper is organized as follows. In Section 2 we show that the general forms of (1.3) and (1.2) can be derived solely using a scaled projection of a random matrix that is of radial type. In Section 3 we investigate the limiting behavior of estimators of a linear covariance structure in a variety of multivariate models. We establish that these estimators asymptotically behave the same as a scaled projection of a sequence of affine equivariant covariance estimators that are asymptotically of radial type. In Section 4 we derive the limiting distribution of scale invariant map**s of estimators of a linear covariance structure that are asymptotically normal, and similarly for scale invariant map**s of estimators of the vector of variance components. In Section 5 we derive a characterization for the influence function of linearly structured covariance functionals and the corresponding functional of variance components, and of scale invariant map**s thereof. In Section 6 we apply our results to investigate how the efficiency, GES, and breakdown point of S-estimators of the variance components are affected simultaneously, when we vary the cut-off value of the rho-function that defines the S-estimator. All proofs are postponed to an appendix at the end of the paper.

2 Projection of a random matrix of radial type

A random matrix $\mathbf{R}$ is said to be of radial type, if for any orthogonal matrix $\mathbf{O}$ , the distribution of $\mathbf{O}\mathbf{R}\mathbf{O}^{T}$ is the same as that of $\mathbf{R}$ . The covariance structure of random matrices with a radial distribution was first given by Mallows [19] in index form. Tyler [26] gave the covariance structure in matrix form and provided necessary conditions on its parameters. A random matrix $\mathbf{N}$ is said to be of radial type with respect to the positive definite symmetric matrix $\mathbf{\Sigma}$ , if $\mathbf{\Sigma}^{-1/2}\mathbf{N}\mathbf{\Sigma}^{-1/2}$ has a radial distribution. If the first two moments of $\mathbf{N}$ exist, then according to Corollary 1 in Tyler [26], the variance of $\mathbf{N}$ is given by (1.1).

Consider a $k\times k$ structured covariance matrix $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ , where $\mathbf{V}$ is a known covariance structure that is a linear function of $\bm{\theta}=(\theta_{1},\ldots,\theta_{\ell})$ , a vector of unknown variance components. Define the $k^{2}\times\ell$ matrix

\mathbf{L}=\left[\begin{array}[]{ccc}\mathrm{vec}(\mathbf{L}_{1})&\cdots&% \mathrm{vec}(\mathbf{L}_{\ell})\\ \end{array}\right],\quad\mathbf{L}_{j}=\partial\mathbf{V}/\partial\theta_{j},% \text{ for }j=1,\ldots,\ell.

(2.1)

Note that since $\mathbf{V}$ is linear, we can write $\mathbf{\Sigma}=\theta_{1}\mathbf{L}_{1}+\cdots+\theta_{\ell}\mathbf{L}_{\ell}$ and $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}$ . Furthermore, let $\Pi_{L}$ be the projection of a vector $\mathbf{x}\in\mathbb{R}^{k^{2}}$ on the column space of $\mathbf{L}$ , re-scaled by $\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}$ , that is

\Pi_{L}\mathbf{x}=\mathop{\mathrm{argmin}}_{\bm{\theta}\in\mathbb{R}^{\ell}}\,% (\mathbf{x}-\mathbf{L}\bm{\theta})^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{% \Sigma}^{-1})(\mathbf{x}-\mathbf{L}\bm{\theta}).

(2.2)

We then have the following theorem.

Theorem 1.

Let $\mathbf{N}$ be a random matrix that is of radial type with respect to a positive definite symmetric matrix $\mathbf{\Sigma}$ . Suppose that $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ , for some $\bm{\theta}\in\mathbb{R}^{\ell}$ , and that $\mathbf{V}$ is linear such that $\mathbf{L}$ , as defined in (2.1), is of full column rank. Let $\Pi_{L}$ be the projection defined in (2.2) and define the random matrix $\mathbf{M}$ by $\mathrm{vec}(\mathbf{M})=\Pi_{L}\mathrm{vec}(\mathbf{N})$ .

(i)

If the first two moments of $\mathbf{N}$ exist, then there exist constants $\eta$ , $\sigma_{1}$ and $\sigma_{2}$ with $\sigma_{1}\geq 0$ and $\sigma_{2}\geq-2\sigma_{1}/k$ , such that $\mathbb{E}[\mathrm{vec}(\mathbf{M})]=\eta\mathrm{vec}(\mathbf{\Sigma})$ and $\text{\rm var}(\mathrm{vec}(\mathbf{M}))$ is given by (1.3).
(ii)

If $\mathbf{T}\in\mathbb{R}^{\ell}$ is the random vector, such that $\mathrm{vec}(\mathbf{M})=\mathbf{L}\mathbf{T}$ , then $\mathbb{E}[\mathbf{T}]=\eta\bm{\theta}$ and $\text{\rm var}(\mathbf{T})$ is given by (1.2).

Note that the constants $\eta$ , $\sigma_{1}$ and $\sigma_{2}$ have nothing to do with the projection $\Pi_{L}$ , but are inherited from the variance (1.1) of the radial random matrix $\mathbf{N}$ . Their existence is guaranteed by Corollary 1 in Tyler [26].

Examples of multivariate statistical models with a linear covariance structure are linear mixed effects models. But also linear models with errors generated by some autoregressive time series may correspond to a linear covariance structure. When $\mathbf{\Sigma}$ is unstructured and can be any positive definite symmetric covariance matrix, it can also be seen as a linear covariance structure $\mathbf{V}(\bm{\theta})$ , where $\bm{\theta}=\mathrm{vech}(\mathbf{\Sigma})$ , with

\mathrm{vech}(\mathbf{A})=(a_{11},\ldots,a_{k1},a_{22},\ldots,a_{kk}),

(2.3)

is the unique $k(k+1)/2$ -vector that stacks the columns of the lower triangle elements of a symmetric matrix $\mathbf{A}$ . The matrix $\mathbf{L}=\partial\mathrm{vec}(\mathbf{V})/\partial\bm{\theta}^{T}$ is then equal to the so-called duplication matrix $\mathcal{D}_{k}$ , which is the unique $k^{2}\times k(k+1)/2$ matrix, with the properties $\mathcal{D}_{k}\mathrm{vech}(\mathbf{A})=\mathrm{vec}(\mathbf{A})$ and $(\mathcal{D}_{k}^{T}\mathcal{D}_{k})^{-1}\mathcal{D}_{k}^{T}\mathrm{vec}(% \mathbf{A})=\mathrm{vech}(\mathbf{A})$ . Moreover, from the properties of $\mathcal{D}_{k}$ (e.g., see Magnus and Neudecker [18, Ch. 3, Sec. 8]), it follows that

\mathcal{D}_{k}\left(\mathcal{D}_{k}^{T}\left(\mathbf{\Sigma}^{-1}\otimes% \mathbf{\Sigma}^{-1}\right)\mathcal{D}_{k}\right)^{-1}\mathcal{D}_{k}^{T}=% \frac{1}{2}\left(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k}\right)\left(\mathbf{% \Sigma}\otimes\mathbf{\Sigma}\right).

(2.4)

In this case, the expression (1.3) with $\mathbf{L}=\mathcal{D}_{k}$ coincides with the expression (1.1).

3 Projections of estimators of radial type

A sequence $\{\mathbf{N}_{n}\}$ of $k\times k$ symmetric estimators for $\mathbf{\Sigma}$ is said to be asymptotically of radial type if there exists a sequence of real numbers $a_{n}$ increasing to infinity, such that $a_{n}(\mathbf{N}_{n}-\mathbf{\Sigma})\to\mathbf{N}$ in distribution with $\mathbf{N}$ being of radial type with respect to $\mathbf{\Sigma}$ , see Tyler [26]. In a large class of multivariate statistical models, for estimators $\mathbf{V}_{n}$ of a linearly structured covariance matrix, it turns out that the limiting behavior of $\mathrm{vec}(\mathbf{V}_{n})$ is the same as that of the projection $\Pi_{L}\mathrm{vec}(\mathbf{N}_{n})$ of a random matrix $\mathbf{N}_{n}$ that is asymptotically of radial type with respect to $\mathbf{\Sigma}$ , where $\Pi_{L}$ is defined in (2.2). We illustrate this behavior in the following linear model with a structured covariance.

Consider independent observations $\mathbf{s}_{1},\ldots,\mathbf{s}_{n}\in\mathbb{R}^{k}\times\mathbb{R}^{kq}$ with distribution $P$ , where $\mathbf{s}_{i}=(\mathbf{y}_{i},\mathbf{X}_{i})$ , $i=1,\ldots,n$ , for which we assume the following model

\mathbf{y}_{i}=\mathbf{X}_{i}\bm{\beta}+\mathbf{u}_{i},\quad i=1,\ldots,n,

(3.1)

where $\mathbf{y}_{i}\in\mathbb{R}^{k}$ , $\bm{\beta}\in\mathbb{R}^{q}$ is an unknown parameter vector, $\mathbf{X}_{i}\in\mathbb{R}^{k\times q}$ is a known design matrix, and $\mathbf{u}_{i}\in\mathbb{R}^{k}$ are unobservable independent mean zero random vectors with covariance matrix $\mathbf{V}\in\text{PDS}(k)$ , the class of positive definite symmetric $k\times k$ matrices. Suppose that the distribution $P$ for random variable $\mathbf{s}=(\mathbf{y},\mathbf{X})$ is such that $\mathbf{y}\mid\mathbf{X}$ has an elliptically contoured density

f_{\bm{\mu},\mathbf{\Sigma}}(\mathbf{y})=|\mathbf{\Sigma}|^{-1/2}g\left((% \mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm{\mu})\right),

(3.2)

where $\bm{\mu}=\mathbf{X}\bm{\beta}_{0}$ and $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})=\theta_{01}\mathbf{L}_{1}+\cdots+% \theta_{0\ell}\mathbf{L}_{\ell}$ , for some vector $\bm{\theta}_{0}\in\mathbb{R}^{\ell}$ of variance components. This setup includes several multivariate statistical models of interest. One possibility is the linear mixed effects model, in which the random effects together with the measurement error yields a specific covariance structure. Other covariance structures may arise, for example if the $\mathbf{u}_{i}$ are the outcome of a time series. Note that this setup also allows models with an unstructured covariance matrix, such as the multivariate location-scale model or the multivariate regression model. See e.g., Jennrich and Schluchter [10] or Fitzmaurice et al [6], for different possible covariance structures, and Lopuhaä et al [16], who provide a uniform treatment of S-estimators in these models.

Estimators $\bm{\xi}_{n}=(\bm{\beta}_{n},\bm{\theta}_{n})$ for $\bm{\xi}_{0}=(\bm{\beta}_{0},\bm{\theta}_{0})$ are typically solutions of estimating equations of the following type

\int\Psi(\mathbf{s},\bm{\xi})\,\text{d}\mathbb{P}_{n}(\mathbf{s})=\mathbf{0},

(3.3)

where $\mathbb{P}_{n}$ denotes the empirical measure corresponding to $\mathbf{s}_{1},\ldots,\mathbf{s}_{n}$ , and where $\Psi=(\Psi_{\bm{\beta}},\Psi_{\bm{\theta}})$ , with

\begin{split}\Psi_{\bm{\beta}}(\mathbf{s},\bm{\xi})&=w_{1}(d)\mathbf{X}^{T}% \mathbf{V}^{-1}(\mathbf{y}-\mathbf{X}\bm{\beta})\\ \Psi_{\bm{\theta}}(\mathbf{s},\bm{\xi})&=\mathbf{L}^{T}(\mathbf{V}^{-1}\otimes% \mathbf{V}^{-1})\mathrm{vec}\left\{w_{2}(d)(\mathbf{y}-\mathbf{X}\bm{\beta})(% \mathbf{y}-\mathbf{X}\bm{\beta})^{T}-w_{3}(d)\mathbf{V}\right\},\end{split}

(3.4)

where $d^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta})^{T}\mathbf{V}^{-1}(\mathbf{y}-\mathbf{% X}\bm{\beta})$ , and where we write $\mathbf{V}$ for $\mathbf{V}(\bm{\theta})$ . We give some examples below. Furthermore, typically $\bm{\xi}_{n}$ will then converge to a solution of the corresponding population equation

\int\Psi(\mathbf{s},\bm{\xi})\,\text{d}P(\mathbf{s})=\mathbf{0}.

(3.5)

Let $\mathbf{V}_{n}=\mathbf{V}(\bm{\theta}_{n})$ . From the estimating equations (3.3) for $\bm{\xi}_{n}$ , we will establish that $\mathrm{vec}(\mathbf{V}_{n})$ is asymptotically equivalent with $\Pi_{L}\mathrm{vec}(\mathbf{N}_{n})$ , for some $\mathbf{N}_{n}$ that is asymptotically of radial type and $\Pi_{L}$ defined in (2.2). To this end, we require the following conditions

(C1)

$w_{i}(s)$ is of bounded variation and continuously differentiable, for $i=1,2,3$ ;

(C2)

$w_{1}^{\prime}(s)s^{2}$ , $w_{2}^{\prime}(s)s^{3}$ , and $w_{3}^{\prime}(s)s^{2}$ are bounded;

(C3)

$\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|\mathbf{z}\|)\|% \mathbf{z}\|^{3}+k(k+2)w_{3}(\|\mathbf{z}\|)\Big{]}\neq 0$ and $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|\mathbf{z}\|)\|% \mathbf{z}\|^{3}+2kw_{3}(\|\mathbf{z}\|)-kw_{3}^{\prime}(\|\mathbf{z}\|)\|% \mathbf{z}\|\Big{]}\neq 0$ ,

where $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}$ denotes the expectation with respect to density (3.2) with parameters $(\bm{\mu},\mathbf{\Sigma})=(\mathbf{0},\mathbf{I}_{k})$ . Condition (C3) is to ensure the existence of the scalars $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2. Maronna [21] and Tyler [26] consider M-estimators for multivariate location and covariance. Estimating equations for these estimators would correspond to $\Psi_{\bm{\theta}}$ without the factor $\mathbf{L}^{T}(\mathbf{V}^{-1}\otimes\mathbf{V}^{-1})$ (see Example 2 below) and $w_{3}=1$ . Moreover, they assume that $w_{2}^{\prime}$ is non-negative, which obviously implies (C3).

Theorem 2.

Let $P$ be a distribution for random variable $\mathbf{s}=(\mathbf{y},\mathbf{X})$ , such that $\mathbf{y}\mid\mathbf{X}$ has an elliptically contoured density (3.2), with parameters $\bm{\mu}=\mathbf{X}\bm{\beta}_{0}$ and $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ , for a linear covariance structure $\mathbf{V}$ . Let $\bm{\xi}_{n}$ and $\bm{\xi}_{0}$ be solutions of (3.3) and (3.5), respectively, and suppose that $\bm{\xi}_{n}\to\bm{\xi}_{0}$ in probability. Suppose that $\mathbb{E}\|\mathbf{s}\|^{4}<\infty$ and that $\mathbf{X}$ has full rank with probability one. If $w_{1}$ , $w_{2}$ , and $w_{3}$ satisfy (C1)-(C3), then there exists a sequence $\{\mathbf{N}_{n}\}$ of random matrices, such that

\sqrt{n}\left\{\mathrm{vec}(\mathbf{V}_{n})-\mathrm{vec}(\mathbf{\Sigma})% \right\}=-\Pi_{L}\mathrm{vec}\left\{\sqrt{n}(\mathbf{N}_{n}-\mathbb{E}[\mathbf% {N}_{n}])\right\}+o_{P}(1),

where $\Pi_{L}$ is defined in (2.2). Moreover, $\sqrt{n}(\mathbf{N}_{n}-\mathbb{E}[\mathbf{N}_{n}])\to\mathbf{N}$ in distribution, where $\mathbf{N}$ is a random matrix that has a multivariate normal distribution with mean zero and variance (1.1), with

\begin{split}\sigma_{1}&=\frac{k(k+2)\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[w_{2}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]}{\Big{(}\mathbb{E}_{% \mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^% {3}+k(k+2)w_{3}(\|\mathbf{z}\|)\Big{]}\Big{)}^{2}}\\ \sigma_{2}&=-\frac{2}{k}\sigma_{1}+\frac{4\mathbb{E}_{\mathbf{0},\mathbf{I}_{k% }}\left[\Big{(}w_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}-kw_{3}(\|\mathbf{z}\|)% \Big{)}^{2}\right]}{\left(\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{% \prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}+2kw_{3}(\|\mathbf{z}\|)-kw_{3}^{% \prime}(\|\mathbf{z}\|)\|\mathbf{z}\|\Big{]}\right)^{2}}.\end{split}

Remark 3.1.

From the proof of Theorem 2 one can obtain the following explicit expression for $\mathbf{N}_{n}$ :

\mathbf{N}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left\{v_{1}(d_{i})(\mathbf{y}_{i}-% \mathbf{X}_{i}\bm{\beta}_{0})(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta}_{0})^{T}% -v_{2}(d_{i})\mathbf{\Sigma}\right\},

where $d_{i}^{2}=(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta}_{0})^{T}\mathbf{\Sigma}^{-1% }(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta}_{0})$ , and

\begin{split}v_{1}(s)&=\frac{w_{2}(s)}{\gamma_{1}};\\ v_{2}(s)&=\frac{-\gamma_{2}w_{2}(s)s^{2}+\gamma_{1}w_{3}(s)}{\gamma_{1}(\gamma% _{1}-k\gamma_{2})},\end{split}

where $\gamma_{1}$ and $\gamma_{2}$ are defined in (A.6). Note that $\gamma_{1}$ and $\gamma_{1}-k\gamma_{2}$ are precisely the quantities that appear in condition (C3).

The random matrix $\mathbf{N}$ in Theorem 2 is of radial type with respect to $\mathbf{\Sigma}$ . This follows from the fact that $\mathbf{R}=\mathbf{\Sigma}^{-1/2}\mathbf{N}\mathbf{\Sigma}^{-1/2}$ is multivariate normal with mean zero and variance

\begin{split}\text{var}\{\text{vec}(\mathbf{R})\}&=(\mathbf{\Sigma}^{-1/2}% \otimes\mathbf{\Sigma}^{-1/2})\text{var}\left\{\text{vec}(\mathbf{N})\right\}(% \mathbf{\Sigma}^{-1/2}\otimes\mathbf{\Sigma}^{-1/2})\\ &=\sigma_{1}(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k})+\sigma_{2}\text{vec}(\mathbf% {I}_{k})\text{vec}(\mathbf{I}_{k})^{T}.\end{split}

This immediately gives that for any orthogonal matrix $\mathbf{O}$ , the matrix $\mathbf{O}\mathbf{R}\mathbf{O}^{T}$ is multivariate normal with mean zero and the same variance. From Theorem 2 it follows that $\sqrt{n}\left\{\mathrm{vec}(\mathbf{V}_{n})-\mathrm{vec}(\mathbf{\Sigma})\right\}$ is asymptotically normal with mean zero and a variance that is the same as the variance of $\mathrm{vec}(\mathbf{M})=\Pi_{L}\mathrm{vec}(\mathbf{N})$ . According to Theorem 1 this variance is of the type given by (1.3). Furthermore, if we write $\mathrm{vec}(\mathbf{M})=\mathbf{L}\mathbf{T}$ , then

\sqrt{n}(\bm{\theta}_{n}-\bm{\theta}_{0})=(\mathbf{L}^{T}\mathbf{L})^{-1}% \mathbf{L}^{T}\sqrt{n}\left\{\mathrm{vec}(\mathbf{V}_{n})-\mathrm{vec}(\mathbf% {\Sigma})\right\}\to\mathbf{T},

in distribution, where $\mathbf{T}$ is multivariate normal with mean zero and variance given by (1.2).

3.1 Examples

We discuss some examples of multivariate statistical models that are covered by the setup in (3.1), in which the estimators $(\bm{\beta}_{n},\bm{\theta}_{n})$ are solutions of estimating equation (3.3) for particular functions $w_{1}$ , $w_{2}$ , and $w_{3}$ . In the Appendix we provide a detailed derivation $\sigma_{1}$ and $\sigma_{2}$ for specific special cases and show that their expressions coincide with the ones in Tyler [26] and Lopuhaä et al [15].

Example 1 (Maximum likelihood for multivariate normal).

Suppose that $(\mathbf{y}_{1},\mathbf{X}_{1}),\ldots,(\mathbf{y}_{n},\mathbf{X}_{n})$ are independent, such that $\mathbf{y}_{i}\mid\mathbf{X}_{i}\sim N_{k}(\mathbf{X}_{i}\bm{\beta}_{0},% \mathbf{V}(\bm{\theta}_{0}))$ . The loglikelihood is then given by

\mathcal{L}=-\frac{nk}{2}\log(2\pi)-\frac{n}{2}\log|\mathbf{V}(\bm{\theta})|-% \frac{1}{2}\sum_{i=1}^{n}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta})^{T}\mathbf{% V}(\bm{\theta})^{-1}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta}).

Setting the partial derivatives $\partial\mathcal{L}/\partial\bm{\beta}$ and $\partial\mathcal{L}/\partial\theta_{j}$ equal to zero gives the following estimating equations

\begin{split}\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}^{T}\mathbf{V}^{-1}(% \mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta})&=\mathbf{0},\\ \frac{1}{n}\sum_{i=1}^{n}\left\{(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta})^{T}% \mathbf{V}^{-1}\mathbf{L}_{j}\mathbf{V}^{-1}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{% \beta})-\text{\rm tr}(\mathbf{V}^{-1}\mathbf{L}_{j})\right\}&=0,\end{split}

(3.6)

for $j=1,\ldots,\ell$ , where we write $\mathbf{V}$ for $\mathbf{V}(\bm{\theta})$ . By using the vec-notation and $\mathbf{L}$ as defined in (2.1), we can combine the partial derivatives with respect to $\theta_{j}$ in the second line of (3.6) as follows

\mathbf{L}^{T}(\mathbf{V}^{-1}\otimes\mathbf{V}^{-1})\mathrm{vec}\left\{\frac{% 1}{n}\sum_{i=1}^{n}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta})(\mathbf{y}_{i}-% \mathbf{X}_{i}\bm{\beta})^{T}-\mathbf{V}\right\}=\mathbf{0}.

(3.7)

It follows that the maximum likelihood estimator $(\bm{\beta}_{n},\bm{\theta}_{n})$ satisfies (3.3) and $(\bm{\beta}_{0},\bm{\theta}_{0})$ satisfies (3.5), where $\Psi$ is defined in (3.4) with $w_{1}(s)=w_{2}(s)=w_{3}(s)=1$ . Theorem 2 applies and one finds $\sigma_{1}=1$ and $\sigma_{2}=0$ .

When each $\mathbf{X}_{i}=\mathbf{I}_{k}$ , for $i=1,\ldots,n$ , then the model (3.1) reduces to the multivariate location-scale model. If $\mathbf{\Sigma}$ is unstructured, then $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ , with $\bm{\theta}_{0}=\mathrm{vech}(\mathbf{\Sigma})$ and $\mathbf{L}=\partial\mathrm{vec}(\mathbf{V}(\bm{\theta}_{0}))/\partial\bm{% \theta}^{T}$ is equal to the duplication matrix $\mathcal{D}_{k}$ . In this case, we can remove the factor $\mathbf{L}^{T}(\mathbf{V}^{-1}\otimes\mathbf{V}^{-1})$ from (3.7), and $\mathbf{V}_{n}$ is simply the sample covariance of $\mathbf{y}_{1},\ldots,\mathbf{y}_{n}$ . This example then coincides with Example 1 in Tyler [26].

Example 2 (M-estimators).

As mentioned in Example 1, when each $\mathbf{X}_{i}=\mathbf{I}_{k}$ , for $i=1,\ldots,n$ , and $\mathbf{\Sigma}$ is unstructured, then the model (3.1) reduces to the multivariate location-scale model and we can remove the factor $\mathbf{L}^{T}(\mathbf{V}^{-1}\otimes\mathbf{V}^{-1})$ from $\Psi_{\bm{\theta}}$ in (3.3). In that case, estimating equations (3.3) are equivalent to equations (1.1)-(1.2) in Maronna [21] or equations (4.11)-(4.12) in Huber [9] for M-estimators of multivariate location and covariance. In view of this, solutions $(\bm{\beta}_{n},\bm{\theta}_{n})$ of estimating equations (3.3) are called M-estimators for $(\bm{\beta}_{0},\bm{\theta}_{0})$ . The expressions for $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2 then coincide with the ones in Example 3 in Tyler [26].

As a special case, this includes the estimating equations that correspond to maximum likelihood estimators based on independent observations $(\mathbf{y}_{1},\mathbf{X}_{1}),\ldots,(\mathbf{y}_{n},\mathbf{X}_{n})$ from an elliptical density (3.2). The maximum likelihood estimators $(\bm{\beta}_{n},\bm{\theta}_{n})$ then satisfy estimating equations (3.3), for $w_{1}(s)=w_{2}(s)=-2g^{\prime}(s^{2})/g(s^{2})$ and $w_{3}(s)=1$ . The expressions for $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2 then coincide with the ones in Example 2 in Tyler [26].

Example 3 (S-estimators).

S-estimators for $(\bm{\beta}_{0},\bm{\theta}_{0})$ are defined by means of a function $\rho:\mathbb{R}\to[0,\infty)$ , as the solution to minimizing $|\mathbf{V}(\bm{\theta})|$ , subject to

\frac{1}{n}\sum_{i=1}^{n}\rho\left(\sqrt{(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{% \beta})^{T}\mathbf{V}(\bm{\theta})^{-1}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta% })}\right)\leq b_{0},

where the minimum is taken over all $\bm{\beta}\in\mathbb{R}^{q}$ and $\bm{\theta}\in\mathbb{R}^{\ell}$ , such that $\mathbf{V}(\bm{\theta})\in\text{\rm PDS}(k)$ . These estimators have been studied for linear mixed effects models in Copt and Victoria-Feser [4], Chervoneva and Vishnyakov [1, 2] and for general linear models with a structured covariance in Lopuhaä et al [15]. According to Section 7.2 in [15], S-estimators $(\bm{\beta}_{n},\bm{\theta}_{n})$ satisfy estimating equations (3.3), with $w_{1}(d)=\rho^{\prime}(d)/d$ , $w_{2}(s)=k\rho^{\prime}(s)/s$ and $w_{3}(s)=\rho^{\prime}(s)s-\rho(s)+b_{0}$ . The expressions for $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2 coincide with the ones in Corollary 9.2 in Lopuhaä et al [15].

4 Homogeneous map**s of order zero

Let $H(\mathbf{v})$ be a map** from $\mathbb{R}^{l}$ to $\mathbb{R}^{m}$ that is homogeneous of order zero, that is

H(\mathbf{v})=H(\alpha\mathbf{v}),\text{ for all }\alpha>0.

(4.1)

These map**s have several applications to affine equivariant covariance estimators that have limiting variance (1.1). Tyler [27] uses such a map** to show that the likelihood ratio criterion is asymptotically robust over the class of elliptical distributions. Kent and Tyler [11] consider the shape component of covariance CM-estimators and show that the limiting variance of CM-estimators of shape depends on $\sigma_{1}$ only, which may then serve as an index for the asymptotic relative efficiency. Salibián-Barrera et al [24] derive the influence function of the shape component of covariance MM-functionals and use this to obtain that the limiting variance of MM-estimators of shape only depends on a single scalar. This property of the shape component is a special case of a general result in Tyler [27] for multivariate functionals of affine equivariant covariance estimators that are asymptotically normal with limiting variance (1.1).

Estimators for a structured covariance matrix are typically not affine equivariant and have limiting variance (1.3) instead of (1.1), so that the previous results do not directly apply. The objective of this section is to extend Theorem 1 in Tyler [27] to estimators for a linearly structured covariance, and discuss its consequences for corresponding estimators of shape and scale. Moreover, we establish a similar result for estimators of the vector of variance components and apply this to its normalized version. We then have the following theorem.

Theorem 3.

Consider $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})\in\text{\rm PDS}(k)$ , for some vector $\bm{\theta}_{0}\in\mathbb{R}^{\ell}$ and linear variance structure $\mathbf{V}$ . Let $\{\mathbf{V}_{n}:n\geq 1\}$ be a sequence of estimators for $\mathbf{\Sigma}$ and let $\{\bm{\theta}_{n}:n\geq 1\}$ be a sequence of estimators for the vector $\bm{\theta}_{0}\in\mathbb{R}^{\ell}$ of variance components.

(i)

For $\mathbf{V}\in\text{\rm PDS}(k)$ , let $H(\mathbf{V})$ be continuously differentiable satisfying (4.1). When $\sqrt{n}(\mathbf{V}_{n}-\mathbf{\Sigma})$ converges in distribution to a random matrix $\mathbf{M}$ , that has a multivariate normal distribution with mean zero and variance given by (1.3), then $\sqrt{n}(H(\mathbf{V}_{n})-H(\mathbf{\Sigma}))$ is asymptotically normal with mean zero and variance

2\sigma_{1}H^{\prime}(\mathbf{\Sigma})\mathbf{L}\Big{(}\mathbf{L}^{T}\left(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}% \mathbf{L}^{T}H^{\prime}(\mathbf{\Sigma})^{T}.

(ii)

When $\sqrt{n}(\bm{\theta}_{n}-\bm{\theta}_{0})$ is asymptotically normal with mean zero and variance (1.2). Then for any map** $H(\bm{\theta})$ that satisfies (4.1), it holds that $\sqrt{n}(H(\bm{\theta}_{n})-H(\bm{\theta}_{0}))$ is asymptotically normal with mean zero and variance

2\sigma_{1}H^{\prime}(\bm{\theta}_{0})\Big{(}\mathbf{L}^{T}\left(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}H^{\prime}% (\bm{\theta}_{0})^{T}.

When $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ is unstructured, then $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ , with $\bm{\theta}_{0}=\text{\rm vech}(\mathbf{\Sigma})$ , as defined in (2.3), and $\mathbf{L}$ is the duplication matrix $\mathcal{D}_{k}$ . Because $\mathbf{K}_{k,k}H^{\prime}(\mathbf{V})^{T}=H^{\prime}(\mathbf{V})^{T}$ , for symmetric $\mathbf{V}$ , from (2.4) it follows that Theorem 3(i) with $\mathbf{L}=\mathcal{D}_{k}$ recovers Theorem 1 in Tyler [27].

From Theorem 3 it follows immediately that the asymptotic relative efficiency of different estimators $H(\mathbf{V}_{n})$ for $H(\mathbf{\Sigma})$ can be compared by simply comparing the values of the corresponding scalar $\sigma_{1}$ . Similarly, the scalar $\sigma_{1}$ can also be used as an index for the asymptotic relative efficiency of different estimators $H(\bm{\theta}_{n})$ for $H(\bm{\theta}_{0})$ . We discuss some examples below.

Example 4 (Shape and scale of a structured covariance).

Suppose that $\sqrt{n}(\mathbf{V}_{n}-\mathbf{\Sigma})$ is asymptotically normal with mean zero and variance given by (1.3). Consider the shape component $H(\mathbf{C})=\mathrm{vec}(\mathbf{C})/|\mathbf{C}|^{1/k}$ , where $\mathbf{C}\in\text{\rm PDS}(k)$ . We have that

H^{\prime}(\mathbf{C})=\frac{\partial H(\mathbf{C})}{\partial\mathrm{vec}(% \mathbf{C})^{T}}=-\frac{1}{k}|\mathbf{C}|^{-1/k}\text{\rm vec}(\mathbf{C})% \text{\rm vec}(\mathbf{C}^{-1})^{T}+|\mathbf{C}|^{-1/k}\mathbf{I}_{k^{2}}.

(4.2)

Then, according to Theorem 3(i), for the shape component it follows that $\sqrt{n}(H(\mathbf{V}_{n})-H(\mathbf{\Sigma}))$ is asymptotically normal with mean zero and variance (see Appendix for details)

\frac{2\sigma_{1}}{|\mathbf{\Sigma}|^{2/k}}\left\{\mathbf{L}\Big{(}\mathbf{L}^% {T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{% )}^{-1}\mathbf{L}^{T}-\frac{1}{k}\text{\rm vec}(\mathbf{\Sigma})\text{\rm vec}% (\mathbf{\Sigma})^{T}\right\}.

(4.3)

When $\mathbf{\Sigma}$ is unstructured, then $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ , with $\bm{\theta}_{0}=\text{\rm vech}(\mathbf{\Sigma})$ and $\mathbf{L}$ is the duplication matrix $\mathcal{D}_{k}$ . In that case, from (2.4) it follows that (4.3) with $\mathbf{L}=\mathcal{D}_{k}$ reduces to

\frac{\sigma_{1}}{|\mathbf{\Sigma}|^{2/k}}\left\{\left(\mathbf{I}_{k^{2}}+% \mathbf{K}_{k,k}\right)(\mathbf{\Sigma}\otimes\mathbf{\Sigma})-\frac{2}{k}% \text{\rm vec}(\mathbf{\Sigma})\text{\rm vec}(\mathbf{\Sigma})^{T}\right\}.

This coincides with expression (9) found in [24]. For completeness, consider the scale component $\sigma(\mathbf{C})=|\mathbf{C}|^{1/(2k)}$ . It can be seen that

\sigma^{\prime}(\mathbf{C})=\frac{1}{2k}|\mathbf{C}|^{1/(2k)}\text{\rm vec}(% \mathbf{C}^{-1})^{T}.

(4.4)

Application of the delta method then yields that $\sqrt{n}(\sigma(\mathbf{V}_{n})-\sigma(\mathbf{\Sigma}))$ is asymptotically normal with mean zero and variance

\frac{1}{4}\left(\frac{2\sigma_{1}}{k}+\sigma_{2}\right)|\mathbf{\Sigma}|^{1/k}.

Example 5 (Direction of the vector of variance components).

In order to create a single scalar as an index of the asymptotic efficiency for estimators $\bm{\theta}_{n}$ for the vector $\bm{\theta}_{0}$ of variance components, it is helpful to separate $\bm{\theta}_{0}$ into its direction and length. The direction component $H(\bm{\theta})=\bm{\theta}/\|\bm{\theta}\|$ satisfies (4.1). Its derivative is given by

H^{\prime}(\bm{\theta})=\frac{\partial H(\bm{\theta})}{\partial\bm{\theta}^{T}% }=\frac{1}{\|\bm{\theta}\|}\left(\mathbf{I}_{\ell}-\frac{\bm{\theta}\bm{\theta% }^{T}}{\|\bm{\theta}\|^{2}}\right).

(4.5)

Then, according to Theorem 3(ii), for the direction estimator it follows that $\sqrt{n}(H(\bm{\theta}_{n})-H(\bm{\theta}))$ is asymptotically normal with mean zero and variance

\frac{2\sigma_{1}}{\|\bm{\theta}_{0}\|^{2}}\left(\mathbf{I}_{\ell}-\frac{\bm{% \theta}_{0}\bm{\theta}_{0}^{T}}{\|\bm{\theta}_{0}\|^{2}}\right)\Big{(}\mathbf{% L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}% \Big{)}^{-1}\left(\mathbf{I}_{\ell}-\frac{\bm{\theta}_{0}\bm{\theta}_{0}^{T}}{% \|\bm{\theta}_{0}\|^{2}}\right).

It does not seem possible to simplify this expression any further, but it illustrates that one can use the scalar $\sigma_{1}$ as an index for the asymptotic relative efficiency of estimators $H(\bm{\theta}_{n})$ for $H(\bm{\theta}_{0})$ .

An alternative is the map** $H(\bm{\theta})=\bm{\theta}/|\mathbf{V}(\bm{\theta})|^{1/k}$ . Since $\mathbf{V}$ is linear, this $H$ also satisfies (4.1). For $\mathbf{V}_{n}=\mathbf{V}(\bm{\theta}_{n})$ , it holds that $\bm{\theta}_{n}=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(% \mathbf{V}_{n})$ , so that

H(\bm{\theta}_{n})=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}% \left(\mathbf{V}_{n}/|\mathbf{V}_{n}|^{1/k}\right).

From Example 4, it follows that $\sqrt{n}(H(\bm{\theta}_{n})-H(\bm{\theta}))$ is asymptotically normal with mean zero and variance

\frac{2\sigma_{1}}{|\mathbf{\Sigma}|^{2/k}}\left\{\Big{(}\mathbf{L}^{T}\left(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}-% \frac{1}{k}\bm{\theta}_{0}\bm{\theta}_{0}^{T}\right\}.

This component $H$ leads to a simpler expression for the limiting variance and the scalar $\sigma_{1}$ can again be used as an index for the asymptotic relative efficiency of estimators $H(\bm{\theta}_{n})$ for $H(\bm{\theta}_{0})$ .

5 Influence function of structured covariance functionals

The influence function measures the local robustness of an estimator. It describes the effect of an infinitesimal contamination at a single point on the corresponding functional (see Hampel [7]). Good local robustness is therefore illustrated by a bounded influence function. It is defined as follows. Let $P$ be a distribution on $\mathbb{R}^{k}$ . For $0<h<1$ and $\mathbf{y}\in\mathbb{R}^{k}$ fixed, define the perturbed probability measure $P_{h,\mathbf{y}}=(1-h)P+h\delta_{\mathbf{y}}$ , where $\delta_{\mathbf{y}}$ denotes the Dirac measure at $\mathbf{y}\in\mathbb{R}^{k}$ . The influence function of a $k\times k$ covariance functional $\mathbf{C}(\cdot)$ at probability measure $P$ , is defined as

\text{IF}(\mathbf{y};\mathbf{C},P)=\lim_{h\downarrow 0}\frac{\mathbf{C}((1-h)P% +h\delta_{\mathbf{y}})-\mathbf{C}(P)}{h},

(5.1)

if this limit exists.

Let $P$ be a distribution on $\mathbb{R}^{k}$ with density $|\mathbf{\Sigma}|^{-1/2}g\left((\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(% \mathbf{y}-\bm{\mu})\right)$ , where $\bm{\mu}\in\mathbb{R}^{k}$ and $\mathbf{\Sigma}\in\text{\rm PDS}(k)$ , and let $\mathbf{C}$ be Fisher consistent for $\mathbf{\Sigma}$ , that is $\mathbf{C}(P)=\mathbf{\Sigma}$ , and affine equivariant, meaning $\mathbf{C}(P_{\mathbf{A}\mathbf{y}+\mathbf{b}})=\mathbf{A}\mathbf{C}(P_{% \mathbf{y}})\mathbf{A}^{T}$ , for any nonsingular $k\times k$ matrix $\mathbf{A}$ and $\mathbf{b}\in\mathbb{R}^{k}$ , where $P_{\mathbf{y}}$ denotes the distribution of a random vector $\mathbf{y}$ . Croux and Haesbroeck [5] show that the influence function of such covariance functionals at the $N_{k}(\bm{\mu},\mathbf{\Sigma})$ distribution is given by

\text{IF}(\mathbf{y};\mathbf{C},P)=\alpha_{C}(d(\mathbf{y}))(\mathbf{y}-\bm{% \mu})(\mathbf{y}-\bm{\mu})^{T}-\beta_{C}(d(\mathbf{y}))\mathbf{\Sigma},

(5.2)

for some real valued functions $\alpha_{C}$ and $\beta_{C}$ and where $d(\mathbf{y})^{2}=(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm% {\mu})$ . For more details on $\alpha_{C}$ and $\beta_{C}$ for different covariance functionals, see Croux and Haesbroeck [5].

Structured covariance functionals $\mathbf{M}(\cdot)=\mathbf{V}(\bm{\theta}(\cdot))$ are not necessarily affine equivariant, so that the above characterizations do not directly apply. However, Lopuhaä et al [16] find similar expressions for the influence function of the covariance S-functionals $\mathbf{M}(\cdot)$ and $\bm{\theta}(\cdot)$ in a linear model with a linearly structured covariance $\mathbf{V}$ , see Corollary 8.4 in [16]. The next lemma shows that these expressions will always appear at elliptical distributions for covariance functionals that are a projection of some affine equivariant covariance functional.

Lemma 1.

Let $P$ be a distribution on $\mathbb{R}^{k}$ with density $|\mathbf{\Sigma}|^{-1/2}g\left((\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(% \mathbf{y}-\bm{\mu})\right)$ , where $\bm{\mu}\in\mathbb{R}^{k}$ and $\mathbf{\Sigma}\in\text{\rm PDS}(k)$ . Let $\mathbf{C}$ be an affine equivariant covariance functional which possesses an influence function and is Fisher consistent for $\mathbf{\Sigma}$ . Suppose that $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ , for some $\bm{\theta}_{0}\in\mathbb{R}^{\ell}$ , and that $\mathbf{V}$ is linear such that $\mathbf{L}$ , as defined in (2.1), is of full column rank. Let $\Pi_{L}$ be the projection matrix defined in (2.2) and define the covariance functional $\mathbf{M}$ by $\mathrm{vec}(\mathbf{M})=\Pi_{L}\mathrm{vec}(\mathbf{C})$ . Then the following holds.

(i)

The functional $\mathbf{M}$ is Fisher consistent for $\mathbf{\Sigma}$ and there exist functions $\alpha_{C},\beta_{C}:[0,\infty)\to\mathbb{R}$ , such that $\text{\rm IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),P)$ is given by

\alpha_{C}(d(\mathbf{y}))\mathbf{L}\Big{(}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}% \otimes\mathbf{\Sigma}^{-1})\mathbf{L}\Big{)}^{-1}\mathbf{L}^{T}\mathrm{vec}% \left(\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}% \mathbf{\Sigma}^{-1}\right)-\beta_{C}(d(\mathbf{y}))\mathrm{vec}(\mathbf{% \Sigma}),

where $d^{2}(\mathbf{y})=(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm% {\mu})$ .

(ii)

If $\bm{\theta}(P)\in\mathbb{R}^{\ell}$ is the functional, such that $\mathrm{vec}(\mathbf{M}(\cdot))=\mathbf{L}\bm{\theta}(\cdot)$ , then $\bm{\theta}$ is Fisher consistent for $\bm{\theta}_{0}$ and $\text{\rm IF}(\mathbf{y};\bm{\theta},P)$ is given by

\alpha_{C}(d(\mathbf{y}))\Big{(}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes% \mathbf{\Sigma}^{-1})\mathbf{L}\Big{)}^{-1}\mathbf{L}^{T}\mathrm{vec}\left(% \mathbf{\Sigma}^{-1}(\mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}\mathbf{% \Sigma}^{-1}\right)-\beta_{C}(d(\mathbf{y}))\bm{\theta}_{0}.

Note that the functions $\alpha_{C}$ and $\beta_{C}$ have nothing to do with the projection $\Pi_{L}$ , but are inherited from the influence function (5.2) of the affine equivariant covariance functional $\mathbf{C}$ . At a distribution $P$ that has an elliptical density (3.2) with a linearly structured covariance, Lopuhaä et al [16] find expressions similar to the ones in Lemma 1 for the covariance S-functionals. If the S-functional is defined by some function $\rho$ and constant $b_{0}$ (see Example 3), then

\begin{split}\alpha_{C}(s)&=\frac{k\rho^{\prime}(s)}{s\delta_{1}}\\ \beta_{C}(s)&=\frac{\rho^{\prime}(s)s}{\delta_{1}}-\frac{2(\rho(s)-b_{0})}{% \delta_{2}},\end{split}

(5.3)

where

\begin{split}\delta_{1}&=\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}}\left[\rho^{% \prime\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}+(k+1)\rho^{\prime}(\|\mathbf{z% }\|)\|\mathbf{z}\|\right]}{k+2}\\ \delta_{2}&=\mathbb{E}_{\mathbf{0},\mathbf{I}}\left[\rho^{\prime}(\|\mathbf{z}% \|)\|\mathbf{z}\|\right].\end{split}

(5.4)

These $\alpha_{C}$ and $\beta_{C}$ are the same as the ones that appear in the expression for the influence function of the affine equivariant covariance S-functional $\mathbf{C}$ in the multivariate location-scale model, see Lopuhaä [13] or Salibián-Barrera et al [24], or in the multivariate regression model, see Van Aelst and Willems [28]. Indeed, the influence function $\text{IF}(\mathbf{y},\mathrm{vec}(\mathbf{V}(\bm{\theta})),P)$ of the structured covariance functional in Lopuhaä et al [16] is precisely the projection $\Pi_{L}$ of $\text{IF}(\mathbf{y},\mathrm{vec}(\mathbf{C}),P)$ as obtained in [13, 24, 28].

When $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ is unstructured, then $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ with $\bm{\theta}_{0}=\text{\rm vech}(\mathbf{\Sigma})$ and $\mathbf{L}$ is the duplication matrix $\mathcal{D}_{k}$ . In that case, from (2.4) it follows that the expression for $\text{\rm IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),P)$ in Lemma 1(i) with $\mathbf{L}=\mathcal{D}_{k}$ reduces to

\text{\rm IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),P)=\mathrm{vec}\left\{\alpha% _{C}(d(\mathbf{y}))(\mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}-\beta_{C}(d(% \mathbf{y}))\mathbf{\Sigma}\right\}.

(5.5)

This coincides with the expression found in Lemma 1 in Croux and Haesbroeck [5].

Map**s $H$ that satisfy (4.1) also have useful applications to influence functions of affine equivariant covariance functionals $\mathbf{C}$ and their the gross-error-sensitivity (GES). Kent and Tyler [11] consider functionals $\mathbf{C}/|\mathbf{C}|^{1/k}$ and $\mathbf{C}/\text{tr}(\mathbf{C})$ to obtain that the GES of different CM-functionals is proportional to a single scalar. Salibián-Barrera et al [24] derive the influence function of the shape component of covariance MM-functionals and show that it is proportional to a single function $\alpha_{C}$ , which no longer depends on the scale-functional used in the first step. In fact, these properties hold more general for functionals $H$ satisfying (4.1) applied to affine equivariant covariance functionals. The next lemma establishes similar results for linearly structured covariance functionals.

Lemma 2.

Let $P$ be a distribution on $\mathbb{R}^{k}$ with an elliptical contoured density (3.2). Suppose that $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ , for some $\bm{\theta}_{0}\in\mathbb{R}^{\ell}$ , and that $\mathbf{V}$ is linear such that $\mathbf{L}$ , as defined in (2.1), is of full column rank.

(i)

Let $\mathbf{M}\in\text{\rm PDS}(k)$ be a covariance functional that is Fisher consistent for $\mathbf{\Sigma}$ and which possesses an influence function given by Lemma 1(i). Let $H(\mathbf{M})$ be continuously differentiable in a neighborhood of $\mathbf{M}(P)$ satisfying (4.1). Then $\text{\rm IF}(\mathbf{y};H(\mathbf{M}),P)$ is given by

\alpha_{C}(d(\mathbf{y}))H^{\prime}(\mathbf{\Sigma})\mathbf{L}\Big{(}\mathbf{L% }^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{L}\Big{)}^{-1}% \mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\Big% {(}(\mathbf{y}-\bm{\mu})\otimes(\mathbf{y}-\bm{\mu})\Big{)},

where $d^{2}(\mathbf{y})=(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm% {\mu})$ .

(ii)

Let $\bm{\theta}\in\mathbb{R}^{\ell}$ be a functional that is Fisher consistent for $\bm{\theta}_{0}$ and which possesses an influence function given by Lemma 1(ii). Let $H(\bm{\theta})$ be continuously differentiable in a neighborhood of $\bm{\theta}(P)$ satisfying (4.1). Then $\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)$ is given by

\alpha_{C}(d(\mathbf{y}))H^{\prime}(\bm{\theta}_{0})\Big{(}\mathbf{L}^{T}(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{L}\Big{)}^{-1}\mathbf{% L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\Big{(}(% \mathbf{y}-\bm{\mu})\otimes(\mathbf{y}-\bm{\mu})\Big{)}.

Consider the GES defined by $\sup_{\mathbf{y}\in\mathbb{R}^{k}}\|\text{IF}(\mathbf{y};\cdot)\|$ , for some norm $\|\cdot\|$ . From Lemma 2 it follows immediately that regardless of the choice of the norm, the value $\|\text{IF}(\mathbf{y};H(\mathbf{M}),P)\|$ for different functionals $H(\mathbf{M}(P))$ is proportional to $|\alpha_{C}(d(\mathbf{y}))|$ and similarly for functionals $H(\bm{\theta}(P))$ . We discuss some examples below.

Example 6 (Shape and scale of a structured covariance).

For the shape functional $H(\mathbf{M})=\mathrm{vec}(\mathbf{M})/|\mathbf{M}|^{1/k}$ , from Lemma 2(i) together with (4.2) we find

\text{\rm IF}(\mathbf{y};H(\mathbf{M}),P)=-\frac{1}{k}|\mathbf{\Sigma}|^{-1/k}% \text{\rm tr}\left(\mathbf{\Sigma}^{-1}\text{\rm IF}(\mathbf{y};\mathbf{M},P)% \right)\cdot\mathrm{vec}(\mathbf{\Sigma})+|\mathbf{\Sigma}|^{-1/k}\text{\rm IF% }(\mathbf{y};\mathrm{vec}(\mathbf{M}),P).

See also Salibián et al [24]. In particular, at a distribution $P$ with an elliptically contoured density with parameters $\bm{\mu}$ and $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ one finds that $\text{\rm IF}(\mathbf{y};H(\mathbf{M}),P)$ is given by

\begin{split}\frac{\alpha_{C}(d(\mathbf{y}))}{|\mathbf{\Sigma}|^{1/k}}\bigg{\{% }\mathbf{L}\Big{(}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-% 1})\mathbf{L}\Big{)}^{-1}\mathbf{L}^{T}\mathrm{vec}\left(\mathbf{\Sigma}^{-1}(% \mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}\right)-\frac% {d(\mathbf{y})^{2}}{k}\mathrm{vec}(\mathbf{\Sigma})\bigg{\}},\end{split}

(5.6)

where $d(\mathbf{y})^{2}=(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}(\mathbf{y}-\bm% {\mu})$ . It follows that $\|\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)\|$ will be proportional to $|\alpha_{C}(d(\mathbf{y}))d(\mathbf{y})^{2}|$ . When $\mathbf{\Sigma}$ is unstructured, then $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ , where $\bm{\theta}_{0}=\text{\rm vech}(\mathbf{\Sigma})$ , as defined in (2.3), and $\mathbf{L}$ is the duplication matrix $\mathcal{D}_{k}$ . In that case, from (2.4) it follows that (5.6) with $\mathbf{L}=\mathcal{D}_{k}$ reduces to

\frac{\alpha_{C}(d(\mathbf{y}))}{|\mathbf{\Sigma}|^{1/k}}\mathrm{vec}\left\{(% \mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}-\frac{d(\mathbf{y})^{2}}{k}% \mathbf{\Sigma}\right\},

which coincides with formula (3) in [24]. For completeness, consider the scale component $\sigma(\mathbf{M})=|\mathbf{M}|^{1/(2k)}$ . From (4.4), it follows that

\text{\rm IF}(\mathbf{y};\sigma,P)=\frac{1}{2}|\mathbf{\Sigma}|^{-1/(2k)}% \gamma_{C}(d(\mathbf{y})),

where $\gamma_{C}(s)=\alpha_{C}(s)s^{2}/k-\beta_{C}(s)$ , which matches with equation (4) in [24].

Example 7 (Direction of the vector of variance components).

For the direction functional $H(\bm{\theta})=\bm{\theta}/\|\bm{\theta}\|$ , from Lemma 2(ii) together with (4.5) we find that, at a distribution $P$ with an elliptically contoured distribution with parameters $\bm{\mu}$ and $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})$ , $\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)$ is given by

\alpha_{C}(d(\mathbf{y}))\left(\frac{1}{\|\bm{\theta}_{0}\|}\mathbf{I}_{\ell}-% \frac{\bm{\theta}_{0}\bm{\theta}_{0}^{T}}{\|\bm{\theta}_{0}\|^{3}}\right)\Big{% (}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{L}% \Big{)}^{-1}\mathbf{L}^{T}\mathrm{vec}\left(\mathbf{\Sigma}^{-1}(\mathbf{y}-% \bm{\mu})(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}\right).

It follows that $\|\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)\|$ will be proportional to $|\alpha_{C}(d(\mathbf{y}))d(\mathbf{y})^{2}|$ . An alternative is the map** $H(\bm{\theta})=\bm{\theta}/|\mathbf{V}(\bm{\theta})|^{1/k}$ . Since $\mathbf{V}$ is linear, $H$ satisfies (4.1). For $\mathbf{M}(P)=\mathbf{V}(\bm{\theta}(P))$ , it holds that $\bm{\theta}(P)=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(% \mathbf{M}(P))$ , so that

H(\bm{\theta}(P))=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(% \mathbf{M}(P))/|\mathbf{M}(P)|^{1/k}.

From Example 6 it follows that $\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)$ is given by

\begin{split}&\frac{\alpha_{C}(d(\mathbf{y}))}{|\mathbf{\Sigma}|^{1/k}}\bigg{% \{}\Big{(}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})% \mathbf{L}\Big{)}^{-1}\mathbf{L}^{T}\mathrm{vec}\left(\mathbf{\Sigma}^{-1}(% \mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}\mathbf{\Sigma}^{-1}\right)-\frac% {d(\mathbf{y})^{2}}{k}\bm{\theta}_{0}\bigg{\}}\end{split}

using that $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ . Again we find that $\|\text{\rm IF}(\mathbf{y};H(\bm{\theta}),P)\|$ is proportional to $|\alpha_{C}(d(\mathbf{y}))d(\mathbf{y})^{2}|$ .

6 Application

We apply our results to S-estimators and S-functionals in the linear model (3.1). Let $P$ be the distribution for the random variable $\mathbf{s}=(\mathbf{y},\mathbf{X})$ , which is such that $\mathbf{y}\mid\mathbf{X}$ has an elliptically contoured distribution (3.2) with parameters $\bm{\mu}=\mathbf{X}\bm{\beta}_{0}$ and $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta}_{0})=\theta_{01}\mathbf{L}_{1}+\cdots+% \theta_{0\ell}\mathbf{L}_{\ell}$ . Consider the S-estimator for $(\bm{\beta}_{0},\bm{\theta}_{0})$ defined as the solution to minimizing $|\mathbf{V}(\bm{\theta})|$ , subject to

\frac{1}{n}\sum_{i=1}^{n}\rho\left(\sqrt{(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{% \beta})^{T}\mathbf{V}(\bm{\theta})^{-1}(\mathbf{y}_{i}-\mathbf{X}_{i}\bm{\beta% })}\right)=b_{0},

where the minimum is taken over all $\bm{\beta}\in\mathbb{R}^{q}$ and $\bm{\theta}\in\mathbb{R}^{\ell}$ , such that $\mathbf{V}(\bm{\theta})\in\text{\rm PDS}(k)$ . For the function $\rho$ we take Tukey’s bi-weight

\rho_{\mathrm{B}}(s;c)=\begin{cases}s^{2}/2-s^{4}/(2c^{2})+s^{6}/(6c^{4}),&|s|% \leq c;\\ c^{2}/6&|s|>c,\end{cases}

(6.1)

and $b_{0}=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[\rho_{B}(\|\mathbf{z}\|;c)]$ . From Theorem 6.1 in Lopuhaä et al [15] it is known that the breakdown point of the S-estimator depends on the cut-off constant $c$ and is at least $\lceil nb_{0}/(c^{2}/6)\rceil/n$ , or asymptotically $\epsilon^{*}=b_{0}/(c^{2}/6)$ .

Table 1: Cut-off values of

\rho_{B}

for different breakdown points and dimensions.

\begin{array}[]{crrrrrrrrrr}\hline\cr\hline\cr&&&&&&&&&&\\[-5.0pt] &\lx@intercol\hfil\text{Breakdown point}\hfil\lx@intercol\\ &&&&&&&&&&\\[-5.0pt] k&0.05&0.10&0.15&0.20&0.25&0.30&0.35&0.40&0.45&0.50\\ \cline{2-11}\cr&&&&&&&&&&\\[-5.0pt] 1&7.545&5.182&4.096&3.421&2.937&2.561&2.252&1.988&1.756&1.548\\ 2&10.767&7.474&5.981&5.069&4.427&3.938&3.542&3.209&2.920&2.661\\ 5&17.114&11.950&9.628&8.220&7.242&6.505&5.918&5.432&5.017&4.652\\ 10&24.246&16.961&13.694&11.719&10.351&9.324&8.510&7.840&7.271&6.776\\ &&&&&&&&&&\\[-5.0pt] \hline\cr\hline\cr\end{array}

Table 1 gives the cut-off values of $\rho_{B}$ for given asymptotic lower bounds $\epsilon^{*}=0.05,0.10,\ldots,0.50$ on the breakdown point in dimensions $k=1,2,5,10$ . This table partly overlaps with Table 3 in Rousseeuw and Yohai [23].

According to Corollary 9.2 in Lopuhaä et al [16], the scalar $\lambda=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\rho_{B}^{\prime}(\|% \mathbf{z}\|;c)^{2}\right]/(k\alpha^{2})$ represents the asymptotic efficiency of the regression S-estimator $\bm{\beta}_{n}$ relative to the least squares estimator (for which $\lambda=1)$ , where

\alpha=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\left(1-\frac{1}{k}\right)% \frac{\rho_{B}^{\prime}(\|\mathbf{z}\|;c)}{\|\mathbf{z}\|}+\frac{1}{k}\rho_{B}% ^{\prime\prime}(\|\mathbf{z}\|;c)\right].

(6.2)

From Examples 4 and 5, together with Theorem 2 and Example 3, it follows that the scalar

\sigma_{1}=\frac{k\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\rho_{B}^{\prime% }(\|\mathbf{z}\|;c)^{2}\|\mathbf{z}\|^{2}\right]}{(k+2)\delta_{1}^{2}},

where $\delta_{1}$ is defined in (5.4), serves as an index for the asymptotic efficiency of both the S-estimator of shape as well as the S-estimator for the direction of the vector of variance components, relative to the least squares estimators of shape and direction, respectively (for which $\sigma_{1}=1$ ). Finally, from Example 4, together with Example 3, it follows that

\sigma_{3}=\frac{1}{4}\left(\frac{2\sigma_{1}}{k}+\sigma_{2}\right)=\frac{% \mathbb{E}_{\textbf{0},\mathbf{I}_{k}}\left[\left(\rho_{B}(\|\mathbf{z}\|;c)-b% _{0}\right)^{2}\right]}{\delta_{2}^{2}},

where $\delta_{2}$ is defined in (5.4), serves as an index for the asymptotic efficiency of the S-estimator of scale relative the least squares (for which $\sigma_{3}=1/(2k)$ ). As a consequence, the cutoff constant $c$ of $\rho_{B}$ can be tuned in such a way that the asymptotic efficiency $1/\lambda$ relative to the least squares estimator is high at the normal distribution and similarly for $1/\sigma_{1}$ and $1/(2k\sigma_{3})$ . Since $c$ also determines the breakdown point, this forces a trade-off between efficiency and breakdown point. Typically, large values of $c$ correspond to high efficiency and low breakdown point, and vice-versa for moderate values of $c$ .

We further investigate how this trade-off relates to the gross error sensitivity (GES) of the corresponding S-functionals. For simplicity we only consider perturbations in $\mathbf{y}$ and leave $\mathbf{X}$ unchanged. From Corollary 8.4 in Lopuhaä et al [16], for the regression S-functional it then follows that $\|\text{IF}(\mathbf{y};\bm{\beta},P)\|$ is proportional to $\alpha^{-1}\left|\rho_{B}^{\prime}(d(\mathbf{y});c)\right|$ , where $\alpha$ is defined in (6.2) and $d(\mathbf{y})^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})^{T}\mathbf{\Sigma}^{-1% }(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})$ . Therefore, we propose the scalar

G_{1}=\frac{1}{\alpha}\sup_{s>0}\left|\rho_{B}^{\prime}(s;c)\right|,

as an index for the GES of regression S-functionals. This coincides with the GES index for location CM-functionals in Kent and Tyler [11]. From Examples 6 and 7, together with Lemma 2 and (5.3), for both the shape and direction S-functional, it follows that $\|\text{IF}(\mathbf{y})\|$ is proportional to $\delta_{1}^{-1}|\rho_{B}^{\prime}(d(\mathbf{y});c)d(\mathbf{y})|$ , where $\delta_{1}$ is defined in (5.4). We propose the scalar

G_{2}=\frac{k}{(k+2)\delta_{1}}\sup_{s>0}\left|\rho_{B}^{\prime}(s;c)s\right|,

as an index for the GES of shape and direction S-functionals. In this way, $G_{2}$ coincides with the GES index for CM-functionals of shape in Kent and Tyler [11]. Finally, from Example 6 and (5.3), if follows that for the scale functional $\|\text{IF}(\mathbf{y})\|$ is proportional to $\delta_{2}^{-1}|\rho_{B}(d(\mathbf{y});c)-b_{0}|$ , where $\delta_{2}$ is defined in (5.4). We propose

G_{3}=\frac{1}{\delta_{2}}\sup_{s>0}\left|\rho_{B}(s;c)-b_{0}\right|,

as an index for the GES of the S-functional of scale.

We investigate how the asymptotic efficiency at the normal distribution of the S-estimators, and the GES of the corresponding S-functionals behave as we vary the breakdown point of the S-estimator between 0 and 0.5. Given a value $\epsilon^{*}$ of the breakdown point, we determine the corresponding cut-off constant $c$ by solving $\epsilon^{*}=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[\rho_{B}(\|\mathbf{z}\|;c)% ]/(c^{2}/6)$ . With this value of $c$ , we compute the values of $\lambda$ , $\sigma_{1}$ and $\sigma_{3}$ and the GES indices $G_{1}$ , $G_{2}$ and $G_{3}$ . In Figure 1, on the top row we have plotted the asymptotic relative efficiencies $1/\lambda$ , $1/\sigma_{1}$ and $1/(2k\sigma_{3})$ as a function of the breakdown point for dimensions $k=2,5,10$ , and the bottom row contains plots of the GES indices $G_{1}$ , $G_{2}$ and $G_{3}$ for the same dimensions.

Refer to caption — Figure 1: ARE and GES as functions of the breakdown point at the multivariate normal in dimensions $k=2,5,10$ .

As expected, the efficiency decreases with increasing breakdown point, but the loss of efficiency is less severe for the S-estimator of scale compared to the S-estimator for regression and the S-estimators for shape and direction. In dimension $k=2$ (solid lines), the 50% breakdown S-estimators have asymptotic efficiencies $1/\lambda=0.580$ , $1/\sigma_{1}=0.376$ , and $1/(4\sigma_{3})=0.755$ . However, one can gain both efficiency and lower the GES at the cost of a lower breakdown point. For example, the GES index of the regression functional attains its minimal value $G_{1}=1.927$ at breakdown point 28%, which corresponds to cut-off value $c=4.115$ . For this cut-off value the GES index of the shape and direction functional is $G_{2}=1.368$ , which is not far off from its minimal value 1.344, and the GES index for scale is $G_{3}=3.323$ . Furthermore, the asymptotic efficiencies then become $1/\lambda=0.884$ , $1/\sigma_{1}=0.803$ , and $1/(4\sigma_{3})=0.939$ , for the regression estimator, the estimators of shape and direction, and the scale estimator, respectively. Similarly, the GES index of the shape and direction functionals attains its minimal value $G_{2}=1.344$ for $c=3.722$ . This would yield $G_{1}=1.947$ , $G_{3}=2.844$ , $1/\lambda=0.835$ , $1/\sigma_{1}=0.723$ , $1/(4\sigma_{3})=0.912$ and breakdown point 33%. The GES index of the scale functional attains its minimum value $G_{3}=1.852$ at 50% breakdown point, so no simultaneous gain in efficiency and smaller GES values $G_{1}$ and $G_{2}$ can be achieved at the cost of a smaller breakdown point.

In dimension $k=5$ (dashed lines), the 50% breakdown S-estimators have asymptotic efficiencies $1/\lambda=0.864$ , $1/\sigma_{1}=0.778$ , and $1/(4\sigma_{3})=0.918$ . The GES index of the regression functional attains its minimal value $G_{1}=2.595$ at breakdown point 37%. The corresponding GES index for shape and direction functionals is $G_{2}=1.271$ and $G_{3}=1.480$ for the scale functionals. Corresponding to this smaller regression GES index we observe a gain in the asymptotic efficiencies: $1/\lambda=0.932$ , $1/\sigma_{1}=0.903$ , and $1/(4\sigma_{3})=0.965$ , for the regression estimator, the estimators of shape and direction, and the scale estimator, respectively. The GES index of the shape and direction functionals attains its minimal value at breakdown point 47%, so the gain in both efficiency and a smaller $G_{2}$ value is negligible. The situation for the GES index for scale is the same as in dimension $k=2$ , where no simultaneous gain in efficiency and smaller GES values $G_{1}$ and $G_{2}$ can be achieved at the cost of a smaller breakdown point.

Finally, in dimension (dotted lines), the 50% breakdown S-estimators have asymptotic efficiencies $1/\lambda=0.933$ , $1/\sigma_{1}=0.915$ , and $1/(4\sigma_{3})=0.965$ . The GES index of the regression functional attains its minimal value $G_{1}=3.426$ at breakdown point 42%. The corresponding GES index for shape and direction functionals is $G_{2}=1.221$ and $G_{3}=1.744$ for the scale functionals. Corresponding to this smaller regression GES index we observe a gain in the asymptotic efficiencies: $1/\lambda=0.960$ , $1/\sigma_{1}=0.949$ , and $1/(4\sigma_{3})=0.979$ , for the regression estimator, the estimators of shape and direction, and the scale estimator, respectively. Both GES indices $G_{2}$ and $G_{3}$ attain their minimal values at 50% breakdown, so no simultaneous gain in efficiency and smaller GES value $G_{1}$ can be achieved at the cost of a smaller breakdown point.

We conclude that at a moderate loss of breakdown point, from 50% to about 30%-40%, one can gain efficiency of the S-estimators and at the same time reduce the GES of the regression S-estimator. The improvements becomes less as the dimension increases.

Appendix A Proofs

Proof of Theorem 1.

Proof.

It can be seen that the projection matrix, as defined in (2.2), is given by

\Pi_{L}=\mathbf{L}\left(\mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf% {\Sigma}^{-1}\right)\mathbf{L}\right)^{-1}\mathbf{L}^{T}\left(\mathbf{\Sigma}^% {-1}\otimes\mathbf{\Sigma}^{-1}\right).

(A.1)

Since $\mathbf{N}$ is of radial type with respect to $\mathbf{\Sigma}$ , it follows from Corollary 1 in Tyler [26] that there exist constants $\eta$ , $\sigma_{1}$ and $\sigma_{2}$ with $\sigma_{1}\geq 0$ and $\sigma_{2}\geq-2\sigma_{1}/k$ , such that $\mathbb{E}[\mathbf{N}]=\eta\mathbf{\Sigma}$ and

\text{var}\{\mathrm{vec}(\mathbf{N})\}=\sigma_{1}(\mathbf{I}_{k^{2}}+\mathbf{K% }_{k,k})(\mathbf{\Sigma}\otimes\mathbf{\Sigma})+\sigma_{2}\mathrm{vec}(\mathbf% {\Sigma})\mathrm{vec}(\mathbf{\Sigma})^{T}.

Since $\mathbf{V}$ is linear, it holds that $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}$ , so that $\Pi_{L}\mathrm{vec}(\mathbf{\Sigma})=\mathrm{vec}(\mathbf{\Sigma})$ . It follows that $\mathbf{M}$ has expectation

\mathbb{E}[\mathrm{vec}(\mathbf{M})]=\Pi_{L}\mathrm{vec}(\mathbb{E}[\mathbf{N}% ])=\eta\Pi_{L}\mathrm{vec}(\mathbf{\Sigma})=\eta\mathrm{vec}(\mathbf{\Sigma}),

and variance

\begin{split}\text{var}(\mathrm{vec}(\mathbf{M}))&=\Pi_{L}\text{var}(\mathrm{% vec}(\mathbf{N}))\Pi_{L}^{T}\\ &=\sigma_{1}\Pi_{L}(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k})(\mathbf{\Sigma}% \otimes\mathbf{\Sigma})\Pi_{L}^{T}+\sigma_{2}\Pi_{L}\mathrm{vec}(\mathbf{% \Sigma})\mathrm{vec}(\mathbf{\Sigma})^{T}\Pi_{L}^{T}\\ &=\sigma_{1}\Pi_{L}(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k})(\mathbf{\Sigma}% \otimes\mathbf{\Sigma})\Pi_{L}^{T}+\sigma_{2}\mathrm{vec}(\mathbf{\Sigma})% \mathrm{vec}(\mathbf{\Sigma})^{T}.\end{split}

Note that

\begin{split}\mathbf{K}_{k,k}(\mathbf{A}\otimes\mathbf{B})&=(\mathbf{B}\otimes% \mathbf{A})\mathbf{K}_{k,k}\\ \mathbf{K}_{k,k}\mathrm{vec}(\mathbf{A})&=\mathrm{vec}(\mathbf{A}^{T}),\end{split}

(A.2)

e.g., see [18, Chapter 3, Section 7]. Since $\mathbf{\Sigma}=\mathbf{V}(\bm{\theta})$ is symmetric, also $\mathbf{L}_{j}=\partial\mathbf{V}/\partial\theta_{j}$ is symmetric, for $j=1,\ldots,\ell$ . This means that $\mathbf{K}_{k,k}\mathbf{L}=\mathbf{L}$ and it follows that

\begin{split}\Pi_{L}(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k})(\mathbf{\Sigma}% \otimes\mathbf{\Sigma})\Pi_{L}^{T}&=\mathbf{L}\left(\mathbf{L}^{T}\left(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\right)^{-1}% \mathbf{L}^{T}(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k})\Pi_{L}^{T}\\ &=2\mathbf{L}\left(\mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{% \Sigma}^{-1}\right)\mathbf{L}\right)^{-1}\mathbf{L}^{T}\Pi_{L}^{T}\\ &=2\mathbf{L}\left(\mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{% \Sigma}^{-1}\right)\mathbf{L}\right)^{-1}\mathbf{L}^{T}.\end{split}

This finishes the proof of part (i). Since $\mathbf{L}$ has full rank, it holds that

(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(\mathbf{\Sigma})=(% \mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathbf{L}\bm{\theta}=\bm{\theta},

(A.3)

and similarly $(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(\mathbf{M})=\mathbf{T}$ . This immediately gives

\mathbb{E}[\mathbf{T}]=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathbb{E}% [\mathrm{vec}(\mathbf{M})]=\eta(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}% \mathrm{vec}(\mathbf{\Sigma})=\eta\bm{\theta},

and

\text{\rm var}(\mathbf{T})=(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\text{% \rm var}\{\mathrm{vec}(\mathbf{M})\}\mathbf{L}(\mathbf{L}^{T}\mathbf{L})^{-1}.

When we insert (1.3) and apply (A.3), the theorem follows. ∎

Proof of Theorem 2.

The proof follows the line of reasoning used in the proofs of Theorem 9.1 and Corollary 9.2 in Lopuhaä et al [16] for S-estimators. These proofs are based on estimating equations (3.3) with $w_{1}(d)=\rho^{\prime}(d)/d$ , $w_{2}(d)=k\rho^{\prime}(d)/d$ and $w_{3}(d)=\rho^{\prime}(d)d-\rho(d)+b_{0}$ , and require conditions (R1)-(R5) in [16] on the function $\rho$ . For the proof of Theorem 2 these conditions have been reformulated into similar conditions (C1)-(C3) for general $w_{1}$ , $w_{2}$ , and $w_{3}$ . Furthermore, in order to incorporate the case $w_{1}=w_{2}=w_{3}=1$ of Example 1, we have slightly adapted some of the boundedness conditions and use that

d^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta})^{T}\mathbf{V}^{-1}(\mathbf{y}-\mathbf{% X}\bm{\beta})\leq\frac{\|\mathbf{y}-\mathbf{X}\bm{\beta}\|^{2}}{\lambda_{k}(% \mathbf{V})}\leq\frac{(\|\mathbf{y}\|+\|\mathbf{X}\|\cdot\|\bm{\beta}\|)^{2}}{% \lambda_{k}(\mathbf{V})}\leq\frac{\|\mathbf{s}\|^{2}(1+\|\bm{\beta}\|)^{2}}{% \lambda_{k}(\mathbf{V})}.

This will ensure that $d^{2}$ is bounded by a multiple of $\|\mathbf{s}\|^{2}$ on a neighborhood of $\bm{\xi}_{0}$ . In order to apply dominated convergence, we then require $\mathbb{E}\|\mathbf{s}\|^{4}<\infty$ in Theorem 2 instead of $\mathbb{E}\|\mathbf{X}\|^{2}<\infty$ , which was sufficient for Corollary 9.2 in [16].

Proof.

Define

\Lambda(\bm{\xi})=\int\Psi(\mathbf{s},\bm{\xi})\,\mathrm{d}P(\mathbf{s}).

(A.4)

From the properties of elliptically contoured densities, one has that $\mathbb{E}\left[\Psi(\mathbf{s},\bm{\xi}_{0})\big{|}\mathbf{X}\right]=\mathbf{0}$ , so that $\Lambda(\bm{\xi}_{0})=\mathbf{0}$ . Conditions (C1)-(C3) yield that $\Lambda$ is continuously differentiable and by application of empirical process theory (see e.g., Lemma 11.8 in [17] for the special case of S-estimators) one finds

\begin{split}\mathbf{0}&=\int\Psi(\mathbf{s},\bm{\xi}_{n})\,\mathrm{d}P(% \mathbf{s})+\int\Psi(\mathbf{s},\bm{\xi}_{0})\,\mathrm{d}(\mathbb{P}_{n}-P)(% \mathbf{s})+o_{P}(n^{-1/2})\\ &=\Lambda^{\prime}(\bm{\xi}_{0})(\bm{\xi}_{n}-\bm{\xi}_{0})+\frac{1}{n}\sum_{i% =1}^{n}\Big{\{}\Psi(\mathbf{s}_{i},\bm{\xi}_{0})-\mathbb{E}[\Psi(\mathbf{s}_{i% },\bm{\xi}_{0})]\Big{\}}+o_{P}(n^{-1/2}).\end{split}

(A.5)

Similar to Lemma 8.3 in Lopuhaä et al [16], we find that $\Lambda^{\prime}(\bm{\xi}_{0})$ is a block matrix. This implies that $\sqrt{n}(\bm{\beta}_{n}-\bm{\beta}_{0})$ and $\sqrt{n}(\bm{\theta}_{n}-\bm{\theta}_{0})$ are asymptotically independent and from (A.5) we obtain

\begin{split}\sqrt{n}(\mathrm{vec}(\mathbf{V}_{n})-\mathrm{vec}(\mathbf{\Sigma% }))&=\mathbf{L}\sqrt{n}(\bm{\theta}_{n}-\bm{\theta}_{0})\\ &=-\mathbf{L}\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}\frac{1}{\sqrt{n% }}\sum_{i=1}^{n}\Big{\{}\Psi_{\bm{\theta}}(\mathbf{s}_{i},\bm{\xi}_{0})-% \mathbb{E}[\Psi_{\bm{\theta}}(\mathbf{s}_{i},\bm{\xi}_{0})]\Big{\}}+o_{P}(1),% \end{split}

where $\Psi_{\bm{\theta}}$ is defined in (3.4), and

\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})=\int\frac{\partial\Psi_{\bm{% \theta}}(\mathbf{s},\bm{\xi}_{0})}{\partial\bm{\theta}}\,\mathrm{d}P(\mathbf{s% })=\gamma_{1}\mathbf{L}^{T}\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-% 1}\right)\mathbf{L}-\gamma_{2}\mathbf{L}^{T}\mathrm{vec}(\mathbf{\Sigma}^{-1})% \mathrm{vec}(\mathbf{\Sigma}^{-1})^{T}\mathbf{L},

where

\begin{split}\gamma_{1}&=\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[w_{% 2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}+k(k+2)w_{3}(\|\mathbf{z}\|)% \right]}{k(k+2)}\\ \gamma_{2}&=\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[(k+2)w_{3}^{% \prime}(\|\mathbf{z}\|)\|\mathbf{z}\|-w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{% z}\|^{3}\right]}{2k(k+2)}.\end{split}

(A.6)

First note that we can write

\Psi_{\bm{\theta}}(\mathbf{s},\bm{\xi}_{0})=\mathbf{L}^{T}(\mathbf{\Sigma}^{-1% }\otimes\mathbf{\Sigma}^{-1})\mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},% \bm{\xi}_{0})\right\}

(A.7)

where

\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})=w_{2}(d_{0})(\mathbf{y}-\mathbf{X}% \bm{\beta}_{0})(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})^{T}-w_{3}(d_{0})\mathbf{% \Sigma},

(A.8)

with $d_{0}^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})^{T}\mathbf{\Sigma}^{-1}(% \mathbf{y}-\mathbf{X}\bm{\beta}_{0})$ . Furthermore, similar to Lemma 11.5 in [17], we find that

\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}=a(\mathbf{E}^{T}\mathbf{E})^% {-1}+b(\mathbf{E}^{T}\mathbf{E})^{-1}\mathbf{E}^{T}\mathrm{vec}(\mathbf{I}_{k}% )\mathrm{vec}(\mathbf{I}_{k})^{T}\mathbf{E}(\mathbf{E}^{T}\mathbf{E})^{-1},

where $\mathbf{E}=\left(\mathbf{\Sigma}^{-1/2}\otimes\mathbf{\Sigma}^{-1/2}\right)% \mathbf{L}$ and $a=1/\gamma_{1}$ and $b=\gamma_{2}/(\gamma_{1}(\gamma_{1}-k\gamma_{2}))$ , with

\gamma_{1}-k\gamma_{2}=\frac{1}{2k}\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{% [}w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}+2kw_{3}(\|\mathbf{z}\|)-kw_% {3}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|\Big{]}\neq 0.

This means that $\mathbf{E}^{T}\mathbf{E}=\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{% \Sigma}^{-1})\mathbf{L}$ and since $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ , we have

(\mathbf{E}^{T}\mathbf{E})^{-1}\mathbf{E}^{T}\mathrm{vec}(\mathbf{I}_{k})=(% \mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{L})^{-1% }\mathbf{L}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathrm{vec}(% \mathbf{\Sigma})=\bm{\theta}_{0},

so that $\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}=a(\mathbf{L}^{T}(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})\mathbf{L})^{-1}+b\bm{\theta}_{0}\bm{% \theta}_{0}^{T}$ . Furthermore, since $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ , together with

\mathrm{vec}(\mathbf{A}\mathbf{B}\mathbf{C})=(\mathbf{C}^{T}\otimes\mathbf{A})% \mathrm{vec}(\mathbf{B})

(A.9)

e.g., see [18, Chapter 2, Section 4], and $\Pi_{L}$ as in (A.1), we find

\begin{split}\mathbf{L}\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}% \mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1})&=a\Pi_{L}+b% \mathbf{L}\bm{\theta}_{0}\bm{\theta}_{0}^{T}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1% }\otimes\mathbf{\Sigma}^{-1})=a\Pi_{L}+b\mathrm{vec}(\mathbf{\Sigma})\mathrm{% vec}(\mathbf{\Sigma}^{-1})^{T}.\end{split}

It follows that

\begin{split}\mathbf{L}\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}\Psi_{% \bm{\theta}}(\mathbf{s},\bm{\xi}_{0})&=\mathbf{L}\Lambda_{\bm{\theta}}^{\prime% }(\bm{\xi}_{0})^{-1}\mathbf{L}^{T}(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^% {-1})\mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right\}\\ &=a\Pi_{L}\mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right% \}+b\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(\mathbf{\Sigma}^{-1})^{T}\mathrm% {vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right\}\\ &=a\Pi_{L}\mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right% \}+b\mathrm{vec}(\mathbf{\Sigma})\text{tr}\left\{\mathbf{\Sigma}^{-1}\Psi_{% \mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right\}\\ &=a\Pi_{L}\mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right% \}+b\mathrm{vec}(\mathbf{\Sigma})\left(w_{2}(d_{0})d_{0}^{2}-kw_{3}(d_{0})% \right).\end{split}

Because $\Pi_{L}\mathrm{vec}(\mathbf{\Sigma})=\mathrm{vec}(\mathbf{\Sigma})$ , we conclude that

\mathbf{L}\Lambda_{\bm{\theta}}^{\prime}(\bm{\xi}_{0})^{-1}\Psi_{\bm{\theta}}(% \mathbf{s},\bm{\xi}_{0})=\Pi_{L}\mathrm{vec}\left\{\Psi_{\mathbf{N}}(\mathbf{s% },\bm{\xi}_{0})\right\},

where

\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi})=v_{1}(d)(\mathbf{y}-\mathbf{X}\bm{\beta% })(\mathbf{y}-\mathbf{X}\bm{\beta})^{T}-v_{2}(d)\mathbf{\Sigma},

(A.10)

with $d^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta})^{T}\mathbf{V}^{-1}(\mathbf{y}-\mathbf{% X}\bm{\beta})$ and

\begin{split}v_{1}(s)&=aw_{2}(s)\\ v_{2}(s)&=-bw_{2}(s)s^{2}+(a+bk)w_{3}(s).\end{split}

(A.11)

Hence, if we define

\mathbf{N}_{n}=\frac{1}{n}\sum_{i=1}^{n}\Psi_{\mathbf{N}}(\mathbf{s}_{i},\bm{% \xi}_{0}),

(A.12)

with $\Psi_{\mathbf{N}}$ defined in (A.10), then it follows that

\sqrt{n}(\mathrm{vec}(\mathbf{V}_{n})-\mathrm{vec}(\mathbf{\Sigma}))=\Pi_{L}% \mathrm{vec}\left\{\sqrt{n}(\mathbf{N}_{n}-\mathbb{E}[\mathbf{N}_{n}])\right\}% +o_{P}(1).

This proves the first statement in Theorem 2.

To prove the second statement, note that from $\Lambda(\bm{\xi}_{0})=\mathbf{0}$ , together with (A.7) and (A.9), it follows that

\begin{split}0&=\bm{\theta}_{0}^{T}\mathbb{E}\left[\Psi_{\bm{\theta}}(\mathbf{% s},\bm{\xi}_{0})\right]=\mathbb{E}\left[\mathrm{vec}(\mathbf{\Sigma}^{-1})^{T}% \mathrm{vec}\left\{\Psi_{\mathbf{C}}(\mathbf{s},\bm{\xi}_{0})\right\}\right]\\ &=\mathbb{E}\left[\text{tr}\left(\mathbf{\Sigma}^{-1}\Psi_{\mathbf{C}}(\mathbf% {s},\bm{\xi}_{0})\right)\right]=\mathbb{E}\left[w_{2}(d_{0})d_{0}^{2}-kw_{3}(d% _{0})\right],\end{split}

where $\Psi_{\mathbf{C}}$ is defined in (A.8). Then, from the properties of elliptically contoured densities, together with (A.11), one finds $\mathbb{E}[\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0})]=\mathbf{0}$ . This means that $\sqrt{n}(\mathbf{N}_{n}-\mathbb{E}[\mathbf{N}_{n}])$ is asymptotically normal with mean zero and variance

\mathbb{E}\left[\mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))% \mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))^{T}\right]=\mathbb{E}% \left[\mathbb{E}\left[\mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))% \mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))^{T}\Big{|}\mathbf{X}% \right]\right].

The inner expectation on the right hand side is the conditional expectation of $\mathbf{y}\mid\mathbf{X}$ , which has the same distribution as $\mathbf{\Sigma}^{1/2}\mathbf{z}+\bm{\mu}$ , where $\mathbf{z}$ has a spherical density $f_{\mathbf{0},\mathbf{I}_{k}}(\mathbf{z})=g(\|\mathbf{z}\|^{2})$ . This implies that

\begin{split}&\mathbb{E}\left[\mathrm{vec}\left(\mathbf{\Sigma}^{-1/2}\Psi_{% \mathbf{N}}(\mathbf{s},\bm{\xi}_{0})\mathbf{\Sigma}^{-1/2}\right)\mathrm{vec}% \left(\mathbf{\Sigma}^{-1/2}\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0})\mathbf{% \Sigma}^{-1/2}\right)^{T}\right]\\ &=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)^{2}\|% \mathbf{z}\|^{4}\right]\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\mathrm{vec% }\left(\mathbf{u}\mathbf{u}^{T}\right)\mathrm{vec}\left(\mathbf{u}\mathbf{u}^{% T}\right)^{T}\right]\\ &\quad-\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)v_{2}(% \|\mathbf{z}\|)\|\mathbf{z}\|^{2}\right]\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[\mathrm{vec}\left(\mathbf{u}\mathbf{u}^{T}\right)\mathrm{vec}\left(% \mathbf{I}_{k}\right)^{T}\right]\\ &\quad-\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)v_{2}(% \|\mathbf{z}\|)\|\mathbf{z}\|^{2}\right]\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[\mathrm{vec}\left(\mathbf{I}_{k}\right)\mathrm{vec}\left(\mathbf{u}% \mathbf{u}^{T}\right)^{T}\right]\\ &\quad+\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{2}(\|\mathbf{z}\|)^{2}% \right]\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\mathrm{vec}\left(\mathbf{I% }_{k}\right)\mathrm{vec}\left(\mathbf{I}_{k}\right)^{T}\right],\end{split}

where $\mathbf{u}=\mathbf{z}/\|\mathbf{z}\|$ . From Lemma 5.1 in [13], we have

\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\mathrm{vec}(\mathbf{u}\mathbf{u}^{T})% \mathrm{vec}(\mathbf{u}\mathbf{u}^{T})^{T}=\sigma_{1}(\mathbf{I}_{k^{2}}+% \mathbf{K}_{k,k})+\sigma_{2}\mathrm{vec}(\mathbf{I}_{k})\mathrm{vec}(\mathbf{I% }_{k})^{T},

where $\sigma_{1}=\sigma_{2}=(k(k+2))^{-1}$ . Hence, the first term on the right hand side is equal to

\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)^{2}\|% \mathbf{z}\|^{4}\right]}{k(k+2)}\left(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k}+% \mathrm{vec}(\mathbf{I}_{k})\mathrm{vec}(\mathbf{I}_{k})^{T}\right).

This leads to one term $\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k}$ with coefficient

\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)^{2}\|% \mathbf{z}\|^{4}\right]}{k(k+2)}

and using that, according to Lemma 11.4 in [17], $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\mathbf{u}\mathbf{u}^{T}\right]=(1% /k)\mathbf{I}_{k}$ , we find a second term $\mathrm{vec}(\mathbf{I}_{k})\mathrm{vec}(\mathbf{I}_{k})^{T}$ with coefficient

\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)^{2}\|% \mathbf{z}\|^{4}\right]}{k(k+2)}-\frac{2\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[v_{1}(\|\mathbf{z}\|)v_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}\right]}{k}+% \mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{2}(\|\mathbf{z}\|)^{2}\right].

This means that

\begin{split}&\mathbb{E}\left[\mathrm{vec}\left(\mathbf{\Sigma}^{-1/2}\Psi_{% \mathbf{N}}(\mathbf{s},\bm{\xi}_{0})\mathbf{\Sigma}^{-1/2}\right)\mathrm{vec}% \left(\mathbf{\Sigma}^{-1/2}\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0})\mathbf{% \Sigma}^{-1/2}\right)^{T}\right]\\ &=\sigma_{1}\left(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k}\right)+\sigma_{2}\mathrm% {vec}(\mathbf{I}_{k})\mathrm{vec}(\mathbf{I}_{k})^{T}\end{split}

where

\begin{split}\sigma_{1}&=\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{% 1}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]}{k(k+2)}\\ \sigma_{2}&=\sigma_{1}-\frac{2\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1% }(\|\mathbf{z}\|)v_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}\right]}{k}+\mathbb{E}% _{\mathbf{0},\mathbf{I}_{k}}\left[v_{2}(\|\mathbf{z}\|)^{2}\right],\end{split}

(A.13)

or equivalently

\mathbb{E}\left[\mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))% \mathrm{vec}(\Psi_{\mathbf{N}}(\mathbf{s},\bm{\xi}_{0}))^{T}\right]=\sigma_{1}% \left(\mathbf{I}_{k^{2}}+\mathbf{K}_{k,k}\right)(\mathbf{\Sigma}^{-1}\otimes% \mathbf{\Sigma}^{-1})+\sigma_{2}\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(% \mathbf{\Sigma})^{T}.

We rewrite $\sigma_{2}$ :

\begin{split}\sigma_{2}&=-\frac{2\sigma_{1}}{k}+\frac{(k+2)\sigma_{1}}{k}-% \frac{2\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)v_{2}(% \|\mathbf{z}\|)\|\mathbf{z}\|^{2}\right]}{k}+\mathbb{E}_{\mathbf{0},\mathbf{I}% _{k}}\left[v_{2}(\|\mathbf{z}\|)^{2}\right]\\ &=-\frac{2\sigma_{1}}{k}+\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[v_{% 1}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]}{k^{2}}-\frac{2k\mathbb{E}_{% \mathbf{0},\mathbf{I}_{k}}\left[v_{1}(\|\mathbf{z}\|)v_{2}(\|\mathbf{z}\|)\|% \mathbf{z}\|^{2}\right]}{k^{2}}+\frac{k^{2}\mathbb{E}_{\mathbf{0},\mathbf{I}_{% k}}\left[v_{2}(\|\mathbf{z}\|)^{2}\right]}{k^{2}}\\ &=-\frac{2\sigma_{1}}{k}+\frac{1}{k^{2}}\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[\Big{(}v_{1}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}-kv_{2}(\|\mathbf{z}\|)% \Big{)}^{2}\right].\end{split}

Furthermore

v_{1}(s)s^{2}-kv_{2}(s)=(a+bk)\left\{w_{2}(s)s^{2}-kw_{3}(s)\right\},

where

a+kb=a+\frac{a\gamma_{2}}{\gamma_{1}-k\gamma_{2}}=\frac{1}{\gamma_{1}-k\gamma_% {2}}

where $\gamma_{1}$ and $\gamma_{2}$ are defined in (A.6), which yields

\gamma_{1}-k\gamma_{2}=\frac{1}{2k}\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{% [}w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}+2kw_{3}(\|\mathbf{z}\|)-kw_% {3}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|\Big{]}.

It follows that

\begin{split}\sigma_{1}&=\frac{k(k+2)\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}% \left[w_{2}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]}{\Big{(}\mathbb{E}_{% \mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^% {3}+k(k+2)w_{3}(\|\mathbf{z}\|)\Big{]}\Big{)}^{2}}\\ \sigma_{2}&=-\frac{2}{k}\sigma_{1}+\frac{4\mathbb{E}_{\mathbf{0},\mathbf{I}_{k% }}\left[\Big{(}w_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}-kw_{3}(\|\mathbf{z}\|)% \Big{)}^{2}\right]}{\left(\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{% \prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}+2kw_{3}(\|\mathbf{z}\|)-kw_{3}^{% \prime}(\|\mathbf{z}\|)\|\mathbf{z}\|\Big{]}\right)^{2}.}\end{split}

This proves the theorem. ∎

Proof of Theorem 3.

Proof.

Let $H:\mathbb{R}^{k\times k}\to\mathbb{R}^{m}$ and let

H^{\prime}(\mathbf{V})=\frac{\partial H(\mathbf{V})}{\partial\mathrm{vec}(% \mathbf{V})^{T}}=\left(\frac{\partial H_{i}(\mathbf{V})}{\partial v_{st}}% \right)_{i=1,\ldots,m;\,s,t=1,\ldots,k}

(A.14)

be the $m\times k^{2}$ matrix of partial derivatives. According to the delta method $\sqrt{n}(H(\mathbf{V}_{n})-H(\mathbf{\Sigma}))$ is asymptotically normal with mean zero and variance $H^{\prime}(\mathbf{\Sigma})\text{\rm var}\{\text{\rm vec}(\mathbf{M})\}H^{% \prime}(\mathbf{\Sigma})^{T}$ . Because $H$ is continuously differentiable and satisfies (4.1), it follows that

\sum_{j=1}^{l}v_{j}\frac{\partial H(\mathbf{v})}{\partial v_{j}}=\mathbf{0}.

(A.15)

This means that $H^{\prime}(\mathbf{\Sigma})\mathrm{vec}(\mathbf{\Sigma})=\textbf{0}$ . Then, after inserting (1.3) for $\text{\rm var}\{\text{\rm vec}(\mathbf{M})\}$ , this finishes the proof of part (i).

For part (ii), let $H:\mathbb{R}^{\ell}\to\mathbb{R}^{m}$ , and let

H^{\prime}(\bm{\theta})=\frac{\partial H(\bm{\theta})}{\partial\bm{\theta}^{T}% }=\left(\frac{\partial H_{i}(\bm{\theta})}{\partial\theta_{j}}\right)_{i=1,% \ldots,m;\,j=1,\ldots,\ell}

(A.16)

be the $m\times\ell$ matrix of partial derivatives. According to the delta method $\sqrt{n}(H(\bm{\theta}_{n})-H(\bm{\theta}_{0}))$ is asymptotically normal with mean zero and variance

H^{\prime}(\bm{\theta}_{0})\left\{2\sigma_{1}\Big{(}\mathbf{L}^{T}\left(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}+% \sigma_{2}\bm{\theta}_{0}\bm{\theta}_{0}^{T}\right\}H^{\prime}(\bm{\theta}_{0}% )^{T}.

Because $H$ satisfies (4.1) and (A.15), it follows immediately that $H^{\prime}(\bm{\theta}_{0})\bm{\theta}_{0}=\textbf{0}$ . This finishes the proof of part (ii). ∎

Proof of Lemma 1

Proof.

We apply Lemma 1 in [5]. Although the lemma is established for the $N_{k}(\bm{\mu},\mathbf{\Sigma})$ distribution, the proof holds for any distribution with an elliptically contoured density. According to [5], there exist two functions $\alpha_{C},\beta_{C}:[0,\infty)\to\mathbb{R}$ , such that

\text{IF}(\mathbf{y};\mathbf{C},P_{\bm{\mu},\mathbf{\Sigma}})=\alpha_{C}(d(% \mathbf{y}))(\mathbf{y}-\bm{\mu})(\mathbf{y}-\bm{\mu})^{T}-\beta_{C}(d(\mathbf% {y}))\mathbf{\Sigma}.

(A.17)

We have that

\begin{split}\text{\rm IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),P_{\bm{\mu},% \mathbf{\Sigma}})&=\lim_{h\downarrow 0}\frac{\mathrm{vec}(\mathbf{M}((1-h)P_{% \bm{\mu},\mathbf{\Sigma}}+h\delta_{\mathbf{y}}))-\mathrm{vec}(\mathbf{M})(P_{% \bm{\mu},\mathbf{\Sigma}})}{h}\\ &=\Pi_{L}\lim_{h\downarrow 0}\frac{\mathrm{vec}(\mathbf{C}((1-h)P_{\bm{\mu},% \mathbf{\Sigma}}+h\delta_{\mathbf{y}}))-\mathrm{vec}(\mathbf{C})(P_{\bm{\mu},% \mathbf{\Sigma}})}{h}\\ &=\Pi_{L}\mathrm{vec}\left(\text{IF}(\mathbf{y};\mathbf{C},P_{\bm{\mu},\mathbf% {\Sigma}})\right).\end{split}

Since $\mathbf{V}$ is linear, it holds that $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ and because $\Pi_{L}$ is the projection on the column space of $\mathbf{L}$ , it follows that $\Pi_{L}\mathrm{vec}(\mathbf{\Sigma})=\mathrm{vec}(\mathbf{\Sigma})$ . When we insert the expression (A.1) for $\Pi_{L}$ , together with (A.17) and the fact that $(\mathbf{B}^{T}\otimes\mathbf{A})\mathrm{vec}(\mathbf{v}\mathbf{v}^{T})=% \mathrm{vec}(\mathbf{A}\mathbf{v}\mathbf{v}^{T}\mathbf{B})$ according to (A.9), this finishes the proof of part (i). Since $\mathbf{L}$ has full column rank, $(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(\mathbf{M}(P))=\bm{% \theta}(P)$ , which yields

\text{\rm IF}(\mathbf{y};\bm{\theta},P_{\bm{\mu},\mathbf{\Sigma}})=(\mathbf{L}% ^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\text{\rm IF}(\mathbf{y};\mathrm{vec}(% \mathbf{M}),P_{\bm{\mu},\mathbf{\Sigma}}).

Part (i), together with (A.3) finishes the proof of part (ii). ∎

Proof of Lemma 2

Proof.

Let $H:\mathbb{R}^{k\times k}\to\mathbb{R}^{m}$ with derivative $H^{\prime}$ defined in (A.14). From the definition of influence function, it follows that

\begin{split}\text{IF}(\mathbf{y};H(\mathbf{M}),P)&=\frac{\partial H(\mathbf{M% }(P_{h,\mathbf{y}}))}{\partial h}\bigg{|}_{h=0}\\ &=\frac{\partial H(\mathbf{C})}{\partial\mathrm{vec}(\mathbf{C})^{T}}\bigg{|}_% {\mathbf{C}=\mathbf{M}(P)}\frac{\partial\mathrm{vec}(\mathbf{M}(P_{h,\mathbf{y% }}))}{\partial h}\bigg{|}_{h=0}\\ &=H^{\prime}(\mathbf{M}(P))\cdot\text{IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),% P).\end{split}

(A.18)

Since $\mathbf{M}(P)=\mathbf{\Sigma}$ , after inserting the expression in Lemma 1, together with $\mathrm{vec}(\mathbf{v}\mathbf{v}^{T})=\mathbf{v}\otimes\mathbf{v}$ , for $\text{IF}(\mathbf{y};\mathrm{vec}(\mathbf{M}),P)$ , this proves part (i). Next, let $H:\mathbb{R}^{\ell}\to\mathbb{R}^{m}$ with derivative $H^{\prime}$ defined by (A.16). It follows that

\text{IF}(\mathbf{y};H(\bm{\theta}),P)=H^{\prime}(\bm{\theta}(P))\cdot\text{IF% }(\mathbf{y};\bm{\theta},P).

(A.19)

After inserting the expression in Lemma 1(ii) for $\text{IF}(\mathbf{y};\bm{\theta},P)$ , together with $\bm{\theta}(P)=\bm{\theta}_{0}$ , this proves part (ii). ∎

Appendix B Derivation of $\sigma_{1}$ and $\sigma_{2}$

We compare the expressions for $\sigma_{1}$ and $\sigma_{2}$ derived in Theorem 2 with the ones obtained for specific cases in Tyler [26] and Lopuhaä et al [16].

Example 1.

Inserting $w_{1}=w_{2}=w_{3}=1$ in the expressions for $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2 gives

\sigma_{1}=\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\|\mathbf{z}\|^{4% }\right]}{k(k+2)},

which equals 1 for the multivariate normal. Furthermore,

\begin{split}\sigma_{2}&=-\frac{2}{k}+\frac{4\mathbb{E}_{\mathbf{0},\mathbf{I}% _{k}}\left[(\|\mathbf{z}\|^{2}-k)^{2}\right]}{(2k)^{2}}\\ &=-\frac{2}{k}+\frac{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\|\mathbf{z}% \|^{4}\right]-2k\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[\|\mathbf{z}\|^{2}% \right]+k^{2}}{k^{2}}\\ &=-\frac{2}{k}+\frac{k(k+2)-2k^{2}+k^{2}}{k^{2}}=0.\end{split}

Example 2.

First consider the special case of maximum likelihood, with $w_{1}(s)=w_{2}(s)=-2g^{\prime}(s^{2})/g(s^{2})$ and $w_{3}(s)=1$ . Note that

\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[z(\|\mathbf{z}\|)\right]=\frac{2% \pi^{k/2}}{\Gamma(k/2)}\int_{0}^{\infty}z(r)g(r^{2})r^{k-1}\,\text{d}r,

(B.1)

see e.g., Lemma 1 in Lopuhaä [14]. When $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[\|\mathbf{z}^{2}\|]<\infty$ , then by means of integration by parts we get

\begin{split}\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|% \mathbf{z}\|)\|\mathbf{z}\|^{3}\Big{]}&=\frac{2\pi^{k/2}}{\Gamma(k/2)}\int_{0}% ^{\infty}\frac{4g^{\prime}(r^{2})^{2}}{g(r^{2})^{2}}g(r^{2})r^{k+3}\text{d}r-% \frac{2\pi^{k/2}}{\Gamma(k/2)}\int_{0}^{\infty}\frac{4g^{\prime\prime}(r^{2})}% {g(r^{2})}g(r^{2})r^{k+3}\text{d}r\\ &=4\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}(\|\mathbf{z}\|)^{2}\|% \mathbf{z}\|^{4}\Big{]}-k(k+2).\end{split}

It follows that

\sigma_{1}=\frac{k(k+2)\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[w_{2}(\|% \mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]}{\Big{(}\mathbb{E}_{\mathbf{0},% \mathbf{I}_{k}}[w_{2}^{\prime}(\|\mathbf{z}\|)\|\mathbf{z}\|^{3}]+k(k+2)\Big{)% }^{2}}=\frac{k(k+2)}{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[w_{2}(\|% \mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}\right]},

(B.2)

which coincides with the expression found in Example 2 in Tyler [26], who expresses expectations in terms of the random variable $T=\|\mathbf{z}\|^{2}$ . To compute $\sigma_{2}$ , first note that by means of integration by parts it follows that $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[w_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{2}% ]=k$ . When we insert this in the expression for $\sigma_{2}$ , this gives

\begin{split}\sigma_{2}&=-\frac{2}{k}\sigma_{1}+\frac{4\left(\mathbb{E}_{% \mathbf{0},\mathbf{I}_{k}}\left[w_{2}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}% \right]-k^{2}\right)}{\left(\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[w_{2}(\|% \mathbf{z}\|)^{2}\|\mathbf{z}\|^{4}]-k(k+2)+2k\right)^{2}}=-\frac{2}{k}\sigma_% {1}+\frac{4}{\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[w_{2}(\|\mathbf{z}\|)^{2}% \|\mathbf{z}\|^{4}]-k^{2}}.\end{split}

After inserting $\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[w_{2}(\|\mathbf{z}\|)^{2}\|\mathbf{z}\|% ^{4}]=k(k+2)/\sigma_{1}$ , as follows from (B.2), we find

\sigma_{2}=-\frac{2}{k}\sigma_{1}+\frac{4}{k(k+2)/\sigma_{1}-k^{2}}=\frac{2% \sigma_{1}(1-\sigma_{1})}{k+2-k\sigma_{1}},

which coincides with the expression found in Example 2 in Tyler [26].

Next, consider the general case of M-estimators, with $w_{3}=1$ . First note that Tyler [26] uses a function $u_{2}$ , which relates to our function $w_{2}$ as $w_{2}(s)=u_{2}(s^{2})$ . Then, since $\bm{\xi}_{0}=(\bm{\beta}_{0},\bm{\theta}_{0})$ satisfies (3.5), we find that

\begin{split}0&=\bm{\theta}_{0}^{T}\mathbb{E}\left[\Psi_{\bm{\theta}}(\mathbf{% s},\bm{\xi}_{0})\right]\\ &=\mathbb{E}\left[\mathrm{vec}(\mathbf{\Sigma}^{-1})^{T}\mathrm{vec}\left\{w_{% 2}(d_{0})(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})(\mathbf{y}-\mathbf{X}\bm{\beta}% _{0})^{T}-\mathbf{\Sigma}\right\}\right]\\ &=\mathbb{E}\left[\text{tr}\left\{w_{2}(d_{0})(\mathbf{y}-\mathbf{X}\bm{\beta}% _{0})(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})^{T}\mathbf{\Sigma}^{-1}-\mathbf{I}_% {k}\right\}\right]\\ &=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[w_{2}(\|\mathbf{z}\|)\|\mathbf{z% }\|^{2}-k\right],\end{split}

where $d_{0}^{2}=(\mathbf{y}-\mathbf{X}\bm{\beta}_{0})^{T}\mathbf{\Sigma}^{-1}(% \mathbf{y}-\mathbf{X}\bm{\beta}_{0})$ , so that $k=\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}[w_{2}(\|\mathbf{z}\|)\|\mathbf{z}\|^{% 2}]=\mathbb{E}[u_{2}(T)T]$ . It then follows that

\begin{split}\mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\left[w_{2}(\|\mathbf{z}\|)% ^{2}\|\mathbf{z}\|^{4}\right]&=\mathbb{E}\left[u_{2}(T)^{2}T^{2}\right]=k(k+2)% \psi_{1}\\ \mathbb{E}_{\mathbf{0},\mathbf{I}_{k}}\Big{[}w_{2}^{\prime}(\|\mathbf{z}\|)\|% \mathbf{z}\|^{3}\Big{]}&=2\mathbb{E}\left[u_{2}^{\prime}(T)T^{2}\right]=2k(% \psi_{2}-1),\end{split}

where $\psi_{1}$ and $\psi_{2}$ are defined in Example 3 in Tyler [26]. Then from the expressions provided in Theorem 2 we find

\begin{split}\sigma_{1}&=\frac{k^{2}(k+2)^{2}\psi_{1}}{\left(2k(\psi_{2}-1)+k(% k+2)\right)^{2}}=\frac{(k+2)^{2}\psi_{1}}{(2\psi_{2}+k)^{2}}\\ \sigma_{2}&=-\frac{2\sigma_{1}}{k}+\frac{4\left\{k(k+2)\psi_{1}-k^{2}\right\}}% {(2k\psi_{2})^{2}}.\end{split}

The expression for $\sigma_{1}$ coincides with the one in Example 3 in Tyler [26]. After inserting this in $\sigma_{2}$ , one can verify that also the expression for $\sigma_{2}$ coincides with one in Example 3 in Tyler [26].

Example 3.

With $w_{1}(s)=\rho^{\prime}(s)/s$ , $w_{2}(s)=k\rho^{\prime}(s)/s$ and $w_{3}(s)=\rho^{\prime}(s)s-\rho(s)+b_{0}$ , one can easily verify that the expressions for $\sigma_{1}$ and $\sigma_{2}$ in Theorem 2 coincide with the ones in Corollary 9.2 in Lopuhaä et al [16].

Appendix C Details for Examples 4 and 5

Example 4.

From 4.2 we find

\begin{split}&H^{\prime}(\mathbf{\Sigma})\mathbf{L}\Big{(}\mathbf{L}^{T}\left(% \mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}% \mathbf{L}^{T}H^{\prime}(\mathbf{\Sigma})^{T}\\ &=\frac{1}{k^{2}}|\mathbf{\Sigma}|^{-2/k}\mathrm{vec}(\mathbf{\Sigma})\mathrm{% vec}(\mathbf{\Sigma}^{-1})^{T}\mathbf{L}\Big{(}\mathbf{L}^{T}\left(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}\mathbf{L}% ^{T}\mathrm{vec}(\mathbf{\Sigma}^{-1})\mathrm{vec}(\mathbf{\Sigma})^{T}\\ &\quad-\frac{1}{k}|\mathbf{\Sigma}|^{-2/k}\mathrm{vec}(\mathbf{\Sigma})\mathrm% {vec}(\mathbf{\Sigma}^{-1})^{T}\mathbf{L}\Big{(}\mathbf{L}^{T}\left(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}\mathbf{L}% ^{T}\\ &\quad-\frac{1}{k}|\mathbf{\Sigma}|^{-2/k}\mathbf{L}\Big{(}\mathbf{L}^{T}\left% (\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}% \mathbf{L}^{T}\mathrm{vec}(\mathbf{\Sigma}^{-1})\mathrm{vec}(\mathbf{\Sigma})^% {T}\\ &\quad+|\mathbf{\Sigma}|^{-2/k}\mathbf{L}\Big{(}\mathbf{L}^{T}\left(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}\mathbf{L}% ^{T}.\end{split}

(C.1)

Using that $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ and

\mathrm{vec}(\mathbf{\Sigma}^{-1})=\mathrm{vec}(\mathbf{\Sigma}^{-1}\mathbf{% \Sigma}\mathbf{\Sigma}^{-1})=\left(\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^% {-1}\right)\mathrm{vec}(\mathbf{\Sigma})=\left(\mathbf{\Sigma}^{-1}\otimes% \mathbf{\Sigma}^{-1}\right)\mathbf{L}\bm{\theta}_{0},

the first term on the right hand side of (C.1) reduces to $(1/k)|\mathbf{\Sigma}|^{-2/k}\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(\mathbf% {\Sigma})^{T}$ . Similarly, the second and third term on the right hand side of (C.1) are equal to $-(1/k)|\mathbf{\Sigma}|^{-2/k}\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(% \mathbf{\Sigma})^{T}$ . Putting everything together, we find that the limiting covariance of $\sqrt{n}(H(\mathbf{V}_{n})-H(\mathbf{\Sigma}))$ is given by (4.3).

Example 5.

From Example 4 and the delta method, it follows that the limiting variance of $\sqrt{n}(H(\bm{\theta}_{n})-H(\bm{\theta}))$ is given by

\begin{split}&(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\left[2\sigma_{1}|% \mathbf{\Sigma}|^{-2/k}\left\{\mathbf{L}\Big{(}\mathbf{L}^{T}\left(\mathbf{% \Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}\mathbf{L}% ^{T}-\frac{1}{k}\mathrm{vec}(\mathbf{\Sigma})\mathrm{vec}(\mathbf{\Sigma})^{T}% \right\}\right]\mathbf{L}(\mathbf{L}^{T}\mathbf{L})^{-1}\\ &=2\sigma_{1}|\mathbf{\Sigma}|^{-2/k}\left\{\Big{(}\mathbf{L}^{T}\left(\mathbf% {\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}-\frac{1}% {k}(\mathbf{L}^{T}\mathbf{L})^{-1}\mathbf{L}^{T}\mathrm{vec}(\mathbf{\Sigma})% \mathrm{vec}(\mathbf{\Sigma})^{T}\mathbf{L}(\mathbf{L}^{T}\mathbf{L})^{-1}% \right\}\\ &=\frac{2\sigma_{1}}{|\mathbf{\Sigma}|^{2/k}}\left\{\Big{(}\mathbf{L}^{T}\left% (\mathbf{\Sigma}^{-1}\otimes\mathbf{\Sigma}^{-1}\right)\mathbf{L}\Big{)}^{-1}-% \frac{1}{k}\bm{\theta}_{0}\bm{\theta}_{0}^{T}\right\},\end{split}

using that $\mathrm{vec}(\mathbf{\Sigma})=\mathbf{L}\bm{\theta}_{0}$ .

References

[1] I. Chervoneva and M. Vishnyakov. Constrained $S$ -estimators for linear mixed effects models with covariance components. Stat. Med., 30(14):1735–1750, 2011.
[2] I. Chervoneva and M. Vishnyakov. Generalized s-estimators for linear mixed effects models. Statistica Sinica, 24(3):1257–1276, 2014.
[3] S. Copt and S. Heritier. Robust alternatives to the f-test in mixed linear models based on mm-estimates. Biometrics, 63(4):1045–1052, 2007.
[4] S. Copt and M. P. Victoria-Feser. High-breakdown inference for mixed linear models. Journal of the American Statistical Association, 101(473):292–300, 2006.
[5] C. Croux and G. Haesbroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika, 87(3):603–618, 2000.
[6] G. M. Fitzmaurice, N. M. Laird, and J. H. Ware. Applied longitudinal analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, second edition, 2011.
[7] F. R. Hampel. The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69:383–393, 1974.
[8] H. O. Hartley and J. N. K. Rao. Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika, 54:93–108, 1967.
[9] P. J. Huber. Robust statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, 1981.
[10] R. I. Jennrich and M. D. Schluchter. Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42(4):805–820, 1986.
[11] J. T. Kent and D. E. Tyler. Constrained $M$ -estimation for multivariate location and scatter. Ann. Statist., 24(3):1346–1370, 1996.
[12] N. L. Kudraszow and R. A. Maronna. Estimates of MM type for the multivariate linear model. J. Multivariate Anal., 102(9):1280–1292, 2011.
[13] H. P. Lopuhaä. On the relation between $S$ -estimators and $M$ -estimators of multivariate location and covariance. Ann. Statist., 17(4):1662–1683, 1989.
[14] H. P. Lopuhaä. Asymptotic expansion of $S$ -estimators of location and covariance. Statist. Neerlandica, 51(2):220–237, 1997.
[15] H. P. Lopuhaä. Highly efficient estimators with high breakdown point for linear models with structured covariance matrices. Econometrics and Statistics, 2023.
[16] H. P. Lopuhaä, V. Gares, and A. Ruiz-Gazen. S-estimation in linear models with structured covariance matrices. Ann. Statist., 51(6):2415–2439, 2023.
[17] H. P. Lopuhaä, V. Gares, and A. Ruiz-Gazen. Supplement to “S-estimation in linear models with structured covariance matrices”. 2023.
[18] J. R. Magnus and H. Neudecker. Matrix differential calculus with applications in statistics and econometrics. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Ltd., Chichester, 1988.
[19] C. L. Mallows. Latent vectors of random symmetric matrices. Biometrika, 48:133–149, 1961.
[20] K. V. Mardia and R. J. Marshall. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 71(1):135–146, 1984.
[21] R. A. Maronna. Robust $M$ -estimators of multivariate location and scatter. Ann. Statist., 4(1):51–67, 1976.
[22] J. J. Miller. Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Ann. Statist., 5(4):746–762, 1977.
[23] P. Rousseeuw and V. Yohai. Robust regression by means of S-estimators. In Robust and nonlinear time series analysis (Heidelberg, 1983), volume 26 of Lect. Notes Stat., pages 256–272. Springer, New York, 1984.
[24] M. Salibián-Barrera, S. Van Aelst, and G. Willems. Principal components analysis based on multivariate MM estimators with fast and robust bootstrap. J. Amer. Statist. Assoc., 101(475):1198–1211, 2006.
[25] K. S. Tatsuoka and D. E. Tyler. On the uniqueness of $S$ -functionals and $M$ -functionals under nonelliptical distributions. Ann. Statist., 28(4):1219–1243, 2000.
[26] D. E. Tyler. Radial estimates and the test for sphericity. Biometrika, 69(2):429–436, 1982.
[27] D. E. Tyler. Robustness and efficiency properties of scatter matrices. Biometrika, 70(2):411–420, 1983.
[28] S. Van Aelst and G. Willems. Multivariate regression $S$ -estimators for robust estimation and inference. Statist. Sinica, 15(4):981–1001, 2005.

Asymptotics of estimators for structured covariance matrices

Abstract

1 Introduction

2 Projection of a random matrix of radial type

Theorem 1.

3 Projections of estimators of radial type

Theorem 2.

Remark 3.1.

3.1 Examples

Example 1 (Maximum likelihood for multivariate normal).

Example 2 (M-estimators).

Example 3 (S-estimators).

4 Homogeneous map**s of order zero

Theorem 3.

Example 4 (Shape and scale of a structured covariance).

Example 5 (Direction of the vector of variance components).

5 Influence function of structured covariance functionals

Lemma 1.

Lemma 2.

Example 6 (Shape and scale of a structured covariance).

Example 7 (Direction of the vector of variance components).

6 Application

Appendix A Proofs

Proof of Theorem 1.

Proof.

Proof of Theorem 2.

Proof.

Proof of Theorem 3.

Proof.

Proof of Lemma 1

Proof.

Proof of Lemma 2

Proof.

Appendix B Derivation of σ1subscript𝜎1\sigma_{1}italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and σ2subscript𝜎2\sigma_{2}italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

Example 1.

Example 2.

Example 3.

Appendix C Details for Examples 4 and 5

Example 4.

Example 5.

References

Appendix B Derivation of $\sigma_{1}$ and $\sigma_{2}$