Asymptotics of estimators for structured covariance matrices
Hendrik Paul Lopuhaä
Delft University of Technology
(July 2, 2024)
Abstract
We show that the limiting variance of a sequence of estimators for a structured covariance matrix
has a general form that appears as the variance of a scaled projection of a random matrix that is of radial type
and a similar result is obtained for the corresponding sequence of estimators for the vector of variance components.
These results are illustrated by the limiting behavior of estimators for a linear covariance structure
in a variety of multivariate statistical models.
We also derive a characterization for the influence function of corresponding functionals.
Furthermore, we derive the limiting distribution and influence function of scale invariant map**s
of such estimators and their corresponding functionals.
As a consequence, the asymptotic relative efficiency of different estimators for the shape component
of a structured covariance matrix can be compared by means of a single scalar
and the gross error sensitivity of the corresponding influence functions
can be compared by means of a single index.
Similar results are obtained for estimators of the normalized vector of variance components.
We apply our results to investigate how the efficiency, gross error sensitivity, and breakdown point
of S-estimators for the normalized variance components
are affected simultaneously by varying their cutoff value.
1 Introduction
Covariance matrices describe the relationships and variability between different variables in a dataset.
When there is a known structure or pattern in these relationships,
structured covariance matrices can be estimated to capture and represent that structure.
The use of structured covariance matrices is a valuable tool
for modeling the underlying patterns and dependencies in multivariate data.
It provides a more nuanced understanding of the relationships between variables,
especially in scenarios where variables exhibit specific structures or patterns of correlation.
Structured covariance matrices are commonly used in the analysis of repeated measures,
longitudinal data, and multivariate data with a known underlying structure.
They are particularly useful when there are dependencies or correlations among different measurements or variables
and are widely used in various fields, including biology, medicine, psychology, and social sciences.
When a covariance matrix is unstructured and can be any positive definite symmetric matrix ,
then the limiting behavior of covariance estimators for is well understood.
For example, if is based on a sample from a distribution with
an elliptically contoured density ,
then typically converges in distribution to a random matrix
that has a multivariate normal distribution with mean zero and variance
(1.1)
for some and ,
where denotes the Kronecker product, is the commutation matrix,
and vec is the operator that stacks the columns of a matrix.
This form of limiting variance appears for many covariance estimators.
Tyler [26] gives several examples, including the sample covariance matrix,
and nicely explains that this general form will always appear
when is of radial type with respect to .
The situation becomes different, when estimating a structured covariance matrix
, where is a known covariance structure depending on
a vector of unknown variance components.
Asymptotic results for the maximum likelihood estimator of variance components
in linear models with Gaussian errors having a structured covariance matrix ,
can be found in Hartley and Rao [8],
Miller [22], and Mardia and Marshall [20].
When scaled appropriately, the maximum likelihood estimator is shown to be asymptotically normal
with mean and variance , where
,
for ,
with and .
By employing the vec-notation, the limiting covariance of can be expressed as
where is the matrix with columns .
According to the delta method the limiting covariance of is then given by
Similar results have been obtained in Lopuhaä et al [16] for the class of S-estimators
based on observations that follow a linear model with a structured covariance , where is
a linear function of .
Under appropriate conditions, it holds that
is asymptotically normal with mean zero and variance
(1.2)
and converges in distribution to a random matrix ,
that has a multivariate normal distribution with mean zero and variance
(1.3)
One of the objective of this paper is to show that this general form
will always appear when is a scaled projection on the column space of ,
of a random matrix that is of radial type with respect to .
Moreover, we provide several examples of covariance estimators that exhibit this kind of limiting behavior.
Another objective concerns the asymptotic behavior of estimators for scale invariant map**s
of positive definite symmetric matrices.
For affine equivariant covariance estimators with asymptotic variance (1.1),
Tyler [27] shows that has an asymptotic variance that only depends on the scalar .
When dealing with a structured covariance matrix, the covariance estimators are typically not affine equivariant
and have asymptotic variance (1.3).
The second objective of this paper is to show that Tyler’s result for affine equivariant covariance estimators,
remains true for estimators of a structured covariance matrix.
Moreover, we will establish a similar result for scale invariant map**s
of estimators for the vector of variance components.
An example of a scale invariant map** is the shape component .
A consequence of our results is that the asymptotic relative efficiency of estimators of the shape of
a structured covariance can be compared simply by comparing the
corresponding values for .
For affine equivariant covariance estimators,
this was already observed by Kent and Tyler [11] and Salibián et al [24].
Similar properties will be shown to hold for the direction component corresponding to
the vector of variance components.
A final objective of this paper concerns the influence function of structured covariance functionals.
For affine equivariant covariance functionals, Croux and Haesbroeck [5] show that
the influence function at the multivariate normal is characterized by two real-valued functions.
Structured covariance functionals, however, are not necessarily affine equivariant.
We will show that such a characterization remains valid for structured covariance functionals
at any elliptically contoured distribution,
and similarly for the variance components functional.
A nice consequence is that the influence function of scale invariant map**s of a
structured covariance functional or of itself,
is characterized by a single real-valued function.
As such the gross-error-sensitivity (GES) is proportional to a single index,
which can be used to compare the GES of different shape functionals or different direction functionals.
Kent and Tyler [11] already observed such a property for
the shape component of affine equivariant covariance functionals,
see also Salibián et al [24].
Except that our results have a merit of their own, they also enable the construction of MM-estimators
with auxiliary scale in linear mixed effects models and other linear models with structured covariances.
These estimators inherit the robustness of S-estimators considered in Lopuhaä et al [16] and,
in contrast to the simpler version considered in Lopuhaä [15],
improve both the efficiency of the estimator of the fixed effects as well as the efficiency of the
estimator of the covariance shape component and of the direction of the vector of variance components.
Investigation of this version of MM-estimators will be postponed to a future manuscript, in which we will extend similar results that are already available
for unstructured covariances in the multivariate location-scale model, see Tatsuoka and Tyler [25]
or Salibián-Barrera et al [24],
and in the multivariate regression model, see Kudraszow and Maronna [12].
The paper is organized as follows.
In Section 2 we show that the general forms of (1.3)
and (1.2) can be derived solely using a scaled projection of a random matrix that is of radial type.
In Section 3 we investigate the limiting behavior of
estimators of a linear covariance structure in a variety of multivariate models.
We establish that these estimators asymptotically behave the same as a scaled projection
of a sequence of affine equivariant covariance estimators that are asymptotically of radial type.
In Section 4 we derive the limiting distribution of
scale invariant map**s of estimators of a linear covariance structure that are asymptotically normal,
and similarly for scale invariant map**s of estimators of the vector of variance components.
In Section 5 we derive a characterization for the influence function
of linearly structured covariance functionals and the corresponding functional
of variance components, and of scale invariant map**s thereof.
In Section 6 we apply our results to investigate how
the efficiency, GES, and breakdown point of S-estimators
of the variance components are affected simultaneously,
when we vary the cut-off value of the rho-function that defines the S-estimator.
All proofs are postponed to an appendix at the end of the paper.
2 Projection of a random matrix of radial type
A random matrix is said to be of radial type,
if for any orthogonal matrix ,
the distribution of is the same as that of .
The covariance structure of random matrices with a radial distribution
was first given by Mallows [19] in index form.
Tyler [26] gave the covariance structure in matrix form
and provided necessary conditions on its parameters.
A random matrix is said to be of radial type with respect to the positive definite symmetric matrix ,
if has a radial distribution.
If the first two moments of exist, then according to Corollary 1 in Tyler [26],
the variance of is given by (1.1).
Consider a structured covariance matrix ,
where is a known covariance structure that is a linear function of ,
a vector of unknown variance components.
Define
the matrix
(2.1)
Note that since is linear, we can write
and .
Furthermore, let be the projection of a vector on the
column space of , re-scaled by ,
that is
(2.2)
We then have the following theorem.
Theorem 1.
Let be a random matrix that is of radial type with respect to
a positive definite symmetric matrix .
Suppose that , for some , and that is linear
such that , as defined in (2.1), is of full column rank.
Let be the projection defined in (2.2)
and define the random matrix by .
(i)
If the first two moments of exist,
then there exist constants , and with and , such that
and
is given by (1.3).
(ii)
If is the random vector, such that , then
and
is given by (1.2).
Note that the constants , and have nothing to do with the projection ,
but are inherited from the
variance (1.1) of the radial random matrix .
Their existence is guaranteed by Corollary 1 in Tyler [26].
Examples of multivariate statistical models with a linear covariance structure
are linear mixed effects models.
But also linear models with errors generated by some autoregressive time series
may correspond to a linear covariance structure.
When is unstructured and can be any positive definite symmetric covariance matrix,
it can also be seen as a linear covariance structure , where
, with
(2.3)
is the unique -vector that stacks the columns of the lower triangle elements of a symmetric matrix .
The matrix is then equal to the so-called duplication matrix ,
which is the unique matrix, with the properties
and .
Moreover, from the properties of (e.g., see
Magnus and Neudecker [18, Ch. 3, Sec. 8]), it follows that
(2.4)
In this case, the expression (1.3)
with
coincides with the expression (1.1).
3 Projections of estimators of radial type
A sequence of symmetric estimators for is said to be
asymptotically of radial type
if there exists a sequence of real numbers increasing to infinity,
such that in distribution
with being of radial type with respect to ,
see Tyler [26].
In a large class of multivariate statistical models,
for estimators of a linearly structured covariance matrix,
it turns out that the limiting behavior of is the same as that of the projection
of a random matrix that is asymptotically of radial type with respect to ,
where is defined in (2.2).
We illustrate this behavior in the following linear model with a structured covariance.
Consider independent observations
with distribution ,
where , ,
for which we assume the following model
(3.1)
where ,
is an unknown parameter vector,
is a known design matrix, and
are unobservable independent mean zero random vectors with
covariance matrix ,
the class of positive definite symmetric matrices.
Suppose that the distribution for random variable is such that has an elliptically contoured density
(3.2)
where and ,
for some vector of variance components.
This setup includes several multivariate statistical models of interest.
One possibility is the linear mixed effects model, in which the random effects together with the measurement error
yields a specific covariance structure.
Other covariance structures may arise, for example if the are the outcome of a time series.
Note that this setup also allows models with an unstructured covariance matrix,
such as the multivariate location-scale model or the multivariate regression model.
See e.g., Jennrich and Schluchter [10] or
Fitzmaurice et al [6], for different possible covariance structures,
and Lopuhaä et al [16], who provide a uniform treatment of S-estimators in these models.
Estimators for are typically solutions of estimating equations of the following type
(3.3)
where denotes the empirical measure corresponding to
, and where , with
(3.4)
where ,
and where we write for .
We give some examples below.
Furthermore, typically will then converge to a solution of
the corresponding population equation
(3.5)
Let .
From the estimating equations (3.3) for ,
we will establish that is asymptotically equivalent with ,
for some that is asymptotically of radial type
and defined in (2.2).
To this end, we require the following conditions
(C1)
is of bounded variation and continuously differentiable, for ;
(C2)
, , and are bounded;
(C3)
and
,
where denotes the expectation with respect to
density (3.2) with parameters .
Condition (C3) is to ensure the existence of the scalars and in Theorem 2.
Maronna [21] and Tyler [26] consider M-estimators for multivariate
location and covariance.
Estimating equations for these estimators would correspond to without
the factor (see Example 2 below) and .
Moreover, they assume that is non-negative, which obviously implies (C3).
Theorem 2.
Let be a distribution for random variable , such that
has an elliptically contoured density (3.2),
with parameters and , for a linear covariance structure .
Let and be solutions of (3.3)
and (3.5), respectively, and suppose that in probability.
Suppose that and that has full rank
with probability one.
If , , and satisfy (C1)-(C3), then
there exists a sequence of random matrices, such that
where is defined in (2.2).
Moreover, in distribution, where is
a random matrix that has a multivariate normal distribution with mean zero and
variance (1.1), with
Remark 3.1.
From the proof of Theorem 2 one can obtain the following explicit expression for :
where ,
and
where and are defined in (A.6).
Note that and are precisely the quantities that appear in condition (C3).
The random matrix in Theorem 2 is of radial type with respect to .
This follows from the fact that is multivariate normal with mean zero and variance
This immediately gives that for any orthogonal matrix ,
the matrix is multivariate normal with mean zero and the same variance.
From Theorem 2 it follows that is
asymptotically normal with mean zero and a variance that is the same as the variance of .
According to Theorem 1 this variance is of the type given by (1.3).
Furthermore, if we write , then
in distribution,
where is multivariate normal with mean zero and variance given by (1.2).
3.1 Examples
We discuss some examples of multivariate statistical models that are covered by the setup in (3.1),
in which the estimators are solutions of estimating equation (3.3)
for particular functions , , and .
In the Appendix we provide a detailed derivation and for specific special cases
and show that their expressions coincide with the ones in Tyler [26] and Lopuhaä et al [15].
Example 1(Maximum likelihood for multivariate normal).
Suppose that are independent,
such that .
The loglikelihood is then given by
Setting the partial derivatives
and equal to zero gives the following estimating equations
(3.6)
for ,
where we write for .
By using the vec-notation and as defined in (2.1),
we can combine the partial derivatives with respect to in the second line of (3.6)
as follows
(3.7)
It follows that the maximum likelihood estimator
satisfies (3.3) and satisfies (3.5), where
is defined in (3.4) with .
Theorem 2 applies and one finds and .
When each , for , then the model (3.1) reduces to the multivariate
location-scale model.
If is unstructured, then ,
with and is equal to the duplication matrix .
In this case, we can remove the factor from (3.7),
and is simply the sample covariance of .
This example then coincides with Example 1 in Tyler [26].
Example 2(M-estimators).
As mentioned in Example 1,
when each , for , and is unstructured,
then the model (3.1) reduces to the multivariate location-scale model
and we can remove the factor from in (3.3).
In that case, estimating equations (3.3) are equivalent to equations (1.1)-(1.2) in
Maronna [21]
or equations (4.11)-(4.12) in Huber [9] for M-estimators of multivariate location and covariance.
In view of this, solutions of
estimating equations (3.3) are called M-estimators for .
The expressions for and in Theorem 2 then
coincide with the ones in Example 3 in Tyler [26].
As a special case, this includes the estimating equations that correspond to
maximum likelihood estimators based on independent observations
from an elliptical density (3.2).
The maximum likelihood estimators then satisfy
estimating equations (3.3), for
and .
The expressions for and in Theorem 2 then
coincide with the ones in Example 2 in Tyler [26].
Example 3(S-estimators).
S-estimators for are defined by means of a function ,
as the solution to minimizing ,
subject to
where the minimum is taken over all and ,
such that .
These estimators have been studied for linear mixed effects models in
Copt and Victoria-Feser [4], Chervoneva and Vishnyakov [1, 2]
and for general linear models with a structured covariance in
Lopuhaä et al [15].
According to Section 7.2 in [15], S-estimators satisfy
estimating equations (3.3), with ,
and .
The expressions for and in Theorem 2
coincide with the ones in Corollary 9.2 in
Lopuhaä et al [15].
4 Homogeneous map**s of order zero
Let be a map** from to that
is homogeneous of order zero, that is
(4.1)
These map**s have several applications to affine equivariant covariance estimators
that have limiting variance (1.1).
Tyler [27] uses such a map** to show that the likelihood ratio criterion
is asymptotically robust over the class of elliptical distributions.
Kent and Tyler [11] consider the shape component of covariance CM-estimators
and show that the limiting variance of CM-estimators of shape depends on only,
which may then serve as an index for the asymptotic relative efficiency.
Salibián-Barrera et al [24]
derive the influence function of the shape component of covariance MM-functionals and use this
to obtain that the limiting variance of MM-estimators of shape only depends on a single scalar.
This property of the shape component is a special case of a general result in Tyler [27]
for multivariate functionals of affine equivariant covariance estimators that are asymptotically normal with
limiting variance (1.1).
Estimators for a structured covariance matrix are typically not affine equivariant
and have limiting variance (1.3) instead of (1.1),
so that the previous results do not directly apply.
The objective of this section is to extend Theorem 1 in Tyler [27]
to estimators for a linearly structured covariance,
and discuss its consequences for corresponding estimators of shape and scale.
Moreover, we establish a similar result for estimators of the vector of variance components
and apply this to its normalized version.
We then have the following theorem.
Theorem 3.
Consider ,
for some vector
and linear variance structure .
Let be a sequence of estimators for
and let be a sequence of estimators for the vector of variance components.
(i)
For , let be continuously differentiable satisfying (4.1).
When converges in distribution to a random matrix ,
that has a multivariate normal distribution with mean zero and variance given by (1.3),
then is asymptotically normal with mean zero
and variance
(ii)
When is asymptotically normal with mean zero and variance (1.2).
Then for any map** that satisfies (4.1),
it holds that is asymptotically normal with mean zero
and variance
When is unstructured, then ,
with , as defined in (2.3), and
is the duplication matrix .
Because , for symmetric ,
from (2.4) it follows that Theorem 3(i)
with recovers Theorem 1 in Tyler [27].
From Theorem 3 it follows immediately that
the asymptotic relative efficiency of different estimators for
can be compared by simply comparing the values of the corresponding scalar .
Similarly, the scalar can also be used as an index for the asymptotic relative efficiency
of different estimators for .
We discuss some examples below.
Example 4(Shape and scale of a structured covariance).
Suppose that is asymptotically normal
with mean zero and variance given by (1.3).
Consider the shape component ,
where .
We have that
(4.2)
Then, according to Theorem 3(i), for the shape component it follows that
is asymptotically normal with mean zero
and variance (see Appendix for details)
(4.3)
When is unstructured, then ,
with and
is the duplication matrix .
In that case, from (2.4) it follows that (4.3)
with
reduces to
This coincides with expression (9) found in [24].
For completeness, consider the scale component .
It can be seen that
(4.4)
Application of the delta method then yields that
is asymptotically normal with mean zero and variance
Example 5(Direction of the vector of variance components).
In order to create a single scalar as an index of the asymptotic efficiency for estimators
for the vector of variance components, it is helpful to separate into its direction and length.
The direction component satisfies (4.1).
Its derivative is given by
(4.5)
Then, according to Theorem 3(ii), for the direction estimator it follows that
is asymptotically normal with mean zero
and variance
It does not seem possible to simplify this expression any further,
but it illustrates that one can use the scalar as an index for the asymptotic relative efficiency
of estimators
for .
An alternative is the map** .
Since is linear, this also satisfies (4.1).
For , it holds that ,
so that
From Example 4, it follows that is asymptotically normal
with mean zero and variance
This component leads to a simpler expression for the limiting variance
and the scalar can again be used as an index for the asymptotic relative efficiency of estimators
for .
5 Influence function of structured covariance functionals
The influence function measures the local robustness of an estimator.
It describes the effect of an infinitesimal contamination at a single point on the
corresponding functional
(see Hampel [7]).
Good local robustness is therefore illustrated by a bounded influence function.
It is defined as follows.
Let be a distribution on .
For and fixed, define the perturbed probability measure
,
where denotes the Dirac measure at .
The influence function of a covariance functional at probability measure ,
is defined as
(5.1)
if this limit exists.
Let be a distribution on with density
, where and ,
and let be Fisher consistent for , that is , and affine equivariant,
meaning ,
for any nonsingular matrix and , where denotes the distribution of a random vector .
Croux and Haesbroeck [5] show that the influence function of such covariance functionals
at the distribution is given by
(5.2)
for some real valued functions and and where
.
For more details on and for different covariance functionals,
see Croux and Haesbroeck [5].
Structured covariance functionals are not necessarily affine equivariant,
so that the above characterizations do not directly apply.
However, Lopuhaä et al [16] find similar expressions
for the influence function of the covariance S-functionals and
in a linear model with a linearly structured covariance ,
see Corollary 8.4 in [16].
The next lemma shows that these expressions will always appear at elliptical distributions
for covariance functionals that are a projection of some affine equivariant covariance functional.
Lemma 1.
Let be a distribution on with density
, where and .
Let be an affine equivariant covariance functional
which possesses an influence function and is Fisher consistent for .
Suppose that , for some , and that is linear
such that , as defined in (2.1), is of full column rank.
Let be the projection matrix defined in (2.2)
and define the covariance functional by .
Then the following holds.
(i)
The functional is Fisher consistent for and there exist functions , such that
is given by
where .
(ii)
If is the functional, such that , then
is Fisher consistent for and
is given by
Note that the functions and have nothing to do with the projection ,
but are inherited from the influence function (5.2) of the affine equivariant
covariance functional .
At a distribution that has an elliptical density (3.2) with a linearly structured covariance,
Lopuhaä et al [16] find expressions
similar to the ones in Lemma 1 for the covariance S-functionals.
If the S-functional is defined by some function and constant (see Example 3),
then
(5.3)
where
(5.4)
These and are the same as the ones
that appear in the expression for the influence function of the affine equivariant covariance S-functional in the multivariate location-scale model,
see Lopuhaä [13] or Salibián-Barrera et al [24],
or in the multivariate regression model, see Van Aelst and Willems [28].
Indeed, the influence function of the structured covariance functional
in Lopuhaä et al [16]
is precisely the projection of as obtained in [13, 24, 28].
When is unstructured,
then with
and is the duplication matrix .
In that case, from (2.4) it follows that
the expression for in Lemma 1(i) with
reduces to
(5.5)
This coincides with the expression found in Lemma 1 in
Croux and Haesbroeck [5].
Map**s that satisfy (4.1) also have useful applications
to influence functions of affine equivariant covariance functionals and their the gross-error-sensitivity (GES).
Kent and Tyler [11] consider functionals
and to obtain that the GES of different CM-functionals
is proportional to a single scalar.
Salibián-Barrera et al [24]
derive the influence function of the shape component of covariance MM-functionals and
show that it is proportional to a single function ,
which no longer depends on the scale-functional used in the first step.
In fact, these properties hold more general for functionals satisfying (4.1)
applied to affine equivariant covariance functionals.
The next lemma establishes similar results for linearly structured covariance functionals.
Lemma 2.
Let be a distribution on with an elliptical contoured density (3.2).
Suppose that , for some , and that is linear
such that , as defined in (2.1), is of full column rank.
(i)
Let be a covariance functional
that is Fisher consistent for and
which possesses an influence function given by
Lemma 1(i).
Let be continuously differentiable in a neighborhood of
satisfying (4.1).
Then is given by
where .
(ii)
Let be a functional that is Fisher consistent for
and which possesses an influence function
given by Lemma 1(ii).
Let be continuously differentiable in a neighborhood of
satisfying (4.1).
Then is given by
Consider the GES defined by , for some norm .
From Lemma 2 it follows immediately that regardless of the choice of the norm, the value
for different functionals
is proportional to and similarly for functionals .
We discuss some examples below.
Example 6(Shape and scale of a structured covariance).
For the shape functional ,
from Lemma 2(i) together with (4.2)
we find
See also Salibián et al [24].
In particular, at a distribution with an elliptically contoured density with parameters and
one finds that is given by
(5.6)
where .
It follows that
will be proportional
to .
When is unstructured, then ,
where , as defined in (2.3), and
is the duplication matrix .
In that case, from (2.4) it follows that (5.6)
with
reduces to
which coincides with formula (3) in [24].
For completeness, consider the scale component .
From (4.4), it follows that
Example 7(Direction of the vector of variance components).
For the direction functional ,
from Lemma 2(ii) together with (4.5) we find that,
at a distribution with an elliptically contoured distribution with parameters and ,
is given by
It follows that
will be proportional
to .
An alternative is the map** .
Since is linear, satisfies (4.1).
For , it holds that ,
so that
using that .
Again we find that is proportional to
.
6 Application
We apply our results to S-estimators and S-functionals in the linear model (3.1).
Let be the distribution for the random variable , which is such that
has an elliptically contoured distribution (3.2) with parameters
and .
Consider the S-estimator for
defined as the solution to minimizing ,
subject to
where the minimum is taken over all and ,
such that .
For the function we take Tukey’s bi-weight
(6.1)
and .
From Theorem 6.1 in Lopuhaä et al [15]
it is known that the breakdown point of the S-estimator depends on the cut-off constant
and is at least , or asymptotically .
Table 1: Cut-off values of for different breakdown points and dimensions.
Table 1 gives the cut-off values of for given asymptotic lower bounds
on the breakdown point
in dimensions .
This table partly overlaps with Table 3 in Rousseeuw and Yohai [23].
According to Corollary 9.2 in Lopuhaä et al [16],
the scalar
represents the asymptotic efficiency of the regression S-estimator relative to the least squares estimator
(for which ,
where
(6.2)
From Examples 4 and 5, together with Theorem 2 and Example 3,
it follows that the scalar
where is defined in (5.4),
serves as an index for the asymptotic efficiency of both the S-estimator
of shape as well as the S-estimator for the direction of the vector of variance components,
relative to the least squares estimators of shape and direction, respectively (for which ).
Finally, from Example 4, together with Example 3,
it follows that
where is defined in (5.4),
serves as an index for the asymptotic efficiency of the S-estimator of scale
relative the least squares (for which ).
As a consequence, the cutoff constant of can be tuned in such a way that the asymptotic efficiency
relative to the least squares estimator is high at the normal distribution and similarly for
and .
Since also determines the breakdown point,
this forces a trade-off between efficiency and breakdown point.
Typically, large values of correspond to high efficiency and low breakdown point,
and vice-versa for moderate values of .
We further investigate how this trade-off relates to the gross error sensitivity (GES) of the corresponding S-functionals.
For simplicity we only consider perturbations in and leave unchanged.
From Corollary 8.4 in Lopuhaä et al [16],
for the regression S-functional it then follows that
is proportional to ,
where is defined in (6.2) and .
Therefore, we propose the scalar
as an index for the GES of regression S-functionals.
This coincides with the GES index for location CM-functionals in Kent and Tyler [11].
From Examples 6 and 7, together with Lemma 2 and (5.3),
for both the shape and direction S-functional, it follows that
is proportional to , where
is defined in (5.4).
We propose
the scalar
as an index for the GES of shape and direction S-functionals.
In this way, coincides with the GES index for CM-functionals of shape in Kent and Tyler [11].
Finally, from Example 6 and (5.3), if follows that
for the scale functional is proportional to
, where
is defined in (5.4).
We propose
as an index for the GES of the S-functional of scale.
We investigate how the asymptotic efficiency at the normal distribution of the S-estimators,
and the GES of the corresponding S-functionals behave as we vary the breakdown point of the
S-estimator between 0 and 0.5.
Given a value of the breakdown point, we determine the corresponding cut-off constant
by solving .
With this value of , we compute the values of , and
and the GES indices , and .
In Figure 1, on the top row we have plotted the asymptotic relative efficiencies , and
as a function of the breakdown point for dimensions ,
and the bottom row contains plots of the GES indices , and for the same dimensions.
As expected, the efficiency decreases with increasing breakdown point, but
the loss of efficiency is less severe for the S-estimator of scale compared to
the S-estimator for regression and the S-estimators for shape and direction.
In dimension (solid lines), the 50% breakdown S-estimators have asymptotic efficiencies
, , and .
However, one can gain both efficiency and lower the GES at the cost of a lower breakdown point.
For example, the GES index of the regression functional attains its minimal value at breakdown point 28%,
which corresponds to cut-off value .
For this cut-off value the GES index of the shape and direction functional is , which is not far off from its minimal value 1.344,
and the GES index for scale is .
Furthermore, the asymptotic efficiencies then become , ,
and , for the regression estimator, the estimators of shape and direction, and the scale estimator,
respectively.
Similarly, the GES index of the shape and direction functionals attains its minimal value for .
This would yield , , , , and breakdown point 33%.
The GES index of the scale functional attains its minimum value at 50% breakdown point,
so no simultaneous gain in efficiency and smaller GES values and can be achieved at the cost of a smaller breakdown point.
In dimension (dashed lines), the 50% breakdown S-estimators have asymptotic efficiencies
, , and .
The GES index of the regression functional attains its minimal value at breakdown point 37%.
The corresponding GES index for shape and direction functionals is and for the scale functionals.
Corresponding to this smaller regression GES index we observe a gain in the asymptotic efficiencies:
, ,
and , for the regression estimator, the estimators of shape and direction, and the scale estimator,
respectively.
The GES index of the shape and direction functionals attains its minimal value at breakdown point 47%, so
the gain in both efficiency and a smaller value is negligible.
The situation for the GES index for scale is the same as in dimension ,
where no simultaneous gain in efficiency and smaller GES values and can be achieved at the cost of a smaller breakdown point.
Finally, in dimension (dotted lines), the 50% breakdown S-estimators have asymptotic efficiencies
, , and .
The GES index of the regression functional attains its minimal value at breakdown point 42%.
The corresponding GES index for shape and direction functionals is and for the scale functionals.
Corresponding to this smaller regression GES index we observe a gain in the asymptotic efficiencies:
, ,
and , for the regression estimator, the estimators of shape and direction, and the scale estimator,
respectively.
Both GES indices and attain their minimal values at 50% breakdown,
so no simultaneous gain in efficiency and smaller GES value can be achieved at the cost of a smaller breakdown point.
We conclude that at a moderate loss of breakdown point, from 50% to about 30%-40%, one can gain efficiency
of the S-estimators and at the same time reduce the GES of the regression S-estimator.
The improvements becomes less as the dimension increases.
The proof follows the line of reasoning used in the proofs of Theorem 9.1 and
Corollary 9.2 in Lopuhaä et al [16] for S-estimators.
These proofs are based on estimating equations (3.3) with
, and ,
and require conditions (R1)-(R5) in [16]
on the function .
For the proof of Theorem 2 these conditions have been reformulated
into similar conditions (C1)-(C3) for general , , and .
Furthermore, in order to incorporate the case of Example 1,
we have slightly adapted some of the boundedness conditions and use that
This will ensure that is bounded by a multiple of on a neighborhood of .
In order to apply dominated convergence, we then require
in Theorem 2 instead of , which was sufficient for
Corollary 9.2 in [16].
Proof.
Define
(A.4)
From the properties of elliptically contoured densities, one has that ,
so that .
Conditions (C1)-(C3) yield that is continuously differentiable
and by application of empirical process theory (see e.g., Lemma 11.8 in [17] for the special case of S-estimators)
one finds
(A.5)
Similar to Lemma 8.3 in Lopuhaä et al [16],
we find that is a block matrix.
This implies that and are
asymptotically independent
and from (A.5) we obtain
To prove the second statement,
note that from ,
together with (A.7) and (A.9),
it follows that
where is defined in (A.8).
Then, from the properties of elliptically contoured densities,
together with (A.11), one finds .
This means that is asymptotically normal with mean zero
and variance
The inner expectation on the right hand side is the conditional expectation of ,
which has the same distribution as ,
where has a spherical density .
This implies that
be the matrix of partial derivatives.
According to the delta method
is asymptotically normal with mean zero
and variance
.
Because is continuously differentiable and satisfies (4.1), it follows that
(A.15)
This means that .
Then, after inserting (1.3) for ,
this finishes the proof of part (i).
For part (ii), let , and let
(A.16)
be the matrix of partial derivatives.
According to the delta method
is asymptotically normal with mean zero
and variance
Because satisfies (4.1) and (A.15),
it follows immediately that .
This finishes the proof of part (ii).
∎
We apply Lemma 1 in [5].
Although the lemma is established for the distribution,
the proof holds for any distribution with an elliptically contoured density.
According to [5], there exist two functions ,
such that
(A.17)
We have that
Since is linear, it holds that and because
is the projection on the column space of , it follows that .
When we insert the expression (A.1) for , together with (A.17)
and the fact that according to (A.9), this finishes the proof of part (i).
Since has full column rank, , which yields
Part (i), together with (A.3) finishes the proof of part (ii).
∎
Let with derivative defined in (A.14).
From the definition of influence function, it follows that
(A.18)
Since , after inserting the expression in Lemma 1,
together with ,
for ,
this proves part (i).
Next, let with derivative defined by (A.16).
It follows that
(A.19)
After inserting the expression in Lemma 1(ii) for ,
together with ,
this proves part (ii).
∎
Appendix B Derivation of and
We compare the expressions for and derived in Theorem 2
with the ones obtained for specific cases in Tyler [26] and
Lopuhaä et al [16].
First consider the special case of maximum likelihood, with
and .
Note that
(B.1)
see e.g., Lemma 1 in Lopuhaä [14].
When ,
then by means of integration by parts we get
It follows that
(B.2)
which coincides with the expression found in Example 2 in Tyler [26],
who expresses expectations in terms of the random variable .
To compute , first note that by means of integration by parts it follows
that .
When we insert this in the expression for ,
this gives
which coincides with the expression found in Example 2 in Tyler [26].
Next, consider the general case of M-estimators, with .
First note that Tyler [26] uses a function , which relates to our
function as .
Then, since satisfies (3.5),
we find that
where ,
so that .
It then follows that
where and are defined in Example 3 in Tyler [26].
Then from the expressions provided in Theorem 2 we find
The expression for coincides with the one in Example 3 in Tyler [26].
After inserting this in , one can verify that
also the expression for coincides with one in Example 3 in Tyler [26].
the first term on the right hand side of (C.1) reduces to
.
Similarly, the second and third term on the right hand side
of (C.1) are equal to
.
Putting everything together, we find that the limiting covariance of
is given by (4.3).
From Example 4 and the delta method, it follows that the limiting variance
of is given by
using that .
References
[1]
I. Chervoneva and M. Vishnyakov.
Constrained -estimators for linear mixed effects models with
covariance components.
Stat. Med., 30(14):1735–1750, 2011.
[2]
I. Chervoneva and M. Vishnyakov.
Generalized s-estimators for linear mixed effects models.
Statistica Sinica, 24(3):1257–1276, 2014.
[3]
S. Copt and S. Heritier.
Robust alternatives to the f-test in mixed linear models based on
mm-estimates.
Biometrics, 63(4):1045–1052, 2007.
[4]
S. Copt and M. P. Victoria-Feser.
High-breakdown inference for mixed linear models.
Journal of the American Statistical Association,
101(473):292–300, 2006.
[5]
C. Croux and G. Haesbroeck.
Principal component analysis based on robust estimators of the
covariance or correlation matrix: influence functions and efficiencies.
Biometrika, 87(3):603–618, 2000.
[6]
G. M. Fitzmaurice, N. M. Laird, and J. H. Ware.
Applied longitudinal analysis.
Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.,
Hoboken, NJ, second edition, 2011.
[7]
F. R. Hampel.
The influence curve and its role in robust estimation.
J. Amer. Statist. Assoc., 69:383–393, 1974.
[8]
H. O. Hartley and J. N. K. Rao.
Maximum-likelihood estimation for the mixed analysis of variance
model.
Biometrika, 54:93–108, 1967.
[9]
P. J. Huber.
Robust statistics.
Wiley Series in Probability and Mathematical Statistics. John Wiley
& Sons, Inc., New York, 1981.
[10]
R. I. Jennrich and M. D. Schluchter.
Unbalanced repeated-measures models with structured covariance
matrices.
Biometrics, 42(4):805–820, 1986.
[11]
J. T. Kent and D. E. Tyler.
Constrained -estimation for multivariate location and scatter.
Ann. Statist., 24(3):1346–1370, 1996.
[12]
N. L. Kudraszow and R. A. Maronna.
Estimates of MM type for the multivariate linear model.
J. Multivariate Anal., 102(9):1280–1292, 2011.
[13]
H. P. Lopuhaä.
On the relation between -estimators and -estimators of
multivariate location and covariance.
Ann. Statist., 17(4):1662–1683, 1989.
[14]
H. P. Lopuhaä.
Asymptotic expansion of -estimators of location and covariance.
Statist. Neerlandica, 51(2):220–237, 1997.
[15]
H. P. Lopuhaä.
Highly efficient estimators with high breakdown point for linear
models with structured covariance matrices.
Econometrics and Statistics, 2023.
[16]
H. P. Lopuhaä, V. Gares, and A. Ruiz-Gazen.
S-estimation in linear models with structured covariance matrices.
Ann. Statist., 51(6):2415–2439, 2023.
[17]
H. P. Lopuhaä, V. Gares, and A. Ruiz-Gazen.
Supplement to “S-estimation in linear models with structured
covariance matrices”.
2023.
[18]
J. R. Magnus and H. Neudecker.
Matrix differential calculus with applications in statistics and
econometrics.
Wiley Series in Probability and Mathematical Statistics: Applied
Probability and Statistics. John Wiley & Sons, Ltd., Chichester, 1988.
[19]
C. L. Mallows.
Latent vectors of random symmetric matrices.
Biometrika, 48:133–149, 1961.
[20]
K. V. Mardia and R. J. Marshall.
Maximum likelihood estimation of models for residual covariance in
spatial regression.
Biometrika, 71(1):135–146, 1984.
[21]
R. A. Maronna.
Robust -estimators of multivariate location and scatter.
Ann. Statist., 4(1):51–67, 1976.
[22]
J. J. Miller.
Asymptotic properties of maximum likelihood estimates in the mixed
model of the analysis of variance.
Ann. Statist., 5(4):746–762, 1977.
[23]
P. Rousseeuw and V. Yohai.
Robust regression by means of S-estimators.
In Robust and nonlinear time series analysis (Heidelberg,
1983), volume 26 of Lect. Notes Stat., pages 256–272. Springer, New
York, 1984.
[24]
M. Salibián-Barrera, S. Van Aelst, and G. Willems.
Principal components analysis based on multivariate MM estimators
with fast and robust bootstrap.
J. Amer. Statist. Assoc., 101(475):1198–1211, 2006.
[25]
K. S. Tatsuoka and D. E. Tyler.
On the uniqueness of -functionals and -functionals under
nonelliptical distributions.
Ann. Statist., 28(4):1219–1243, 2000.
[26]
D. E. Tyler.
Radial estimates and the test for sphericity.
Biometrika, 69(2):429–436, 1982.
[27]
D. E. Tyler.
Robustness and efficiency properties of scatter matrices.
Biometrika, 70(2):411–420, 1983.
[28]
S. Van Aelst and G. Willems.
Multivariate regression -estimators for robust estimation and
inference.
Statist. Sinica, 15(4):981–1001, 2005.