HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: secdot

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-SA 4.0
arXiv:2403.04361v1 [math.ST] 07 Mar 2024

SUBSAMPLING FOR BIG DATA LINEAR MODELS

WITH MEASUREMENT ERRORS

Jiangshan Ju, Mingqiu Wang and Shengli Zhao

School of Statistics and Data Science, Qufu Normal University

Abstract: Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical applications, the observed covariates may suffer from inaccuracies due to measurement errors. To address the challenge of large datasets with measurement errors, this study explores two subsampling algorithms based on the corrected likelihood approach: the optimal subsampling algorithm utilizing inverse probability weighting and the perturbation subsampling algorithm employing random weighting assuming a perfectly known distribution. Theoretical properties for both algorithms are provided. Numerical simulations and two real-world examples demonstrate the effectiveness of these proposed methods compared to other uncorrected algorithms.
Key words and phrases: Corrected likelihood method, Measurement error, Subsampling algorithm.

1 Introduction

To address the ever-increasing volume of data brought about by technological advancements, it is imperative to adopt refined techniques such as divide and conquer, online updating of streaming data, and subsampling-based methods. These techniques offer effective solutions to computational challenges posed by large datasets. However, existing literature often assumes that direct and accurate observation of covariates, which may not always be feasible in practical data collection scenarios. Consequently, statistical models that do not account for measurement errors can lead to biased estimated results. Therefore, it is essential to investigate subsampling algorithms for linear models that consider measurement errors in covariates.

For the subsampling algorithm of linear models, Ma, Mahoney and Yu (2015) proposed a leverage sampling algorithm based on leverage scores and their linear transformations. A deterministic subsampling method named information-based optimal subdata selection (IBOSS) was proposed by Wang, Yang and Stufken (2019), and extended by Wang (2019b), aiming to find subsamples with the maximum information matrix under the D-optimality criterion, which performs well in finding corners. Cheng, Wang and Yang (2020) and Yu, Liu and Wang (2023) extended the IBOSS algorithm to logistic and nonlinear models, respectively. Wang et al. (2021) proposed an orthogonal subsampling approach for big data linear regression. Yi and Zhou (2023) and Zhang et al. (2024) explored using space-filling or uniform designs to obtain the subsample so that a wide range of models could be considered.

Wang, Zhu and Ma (2018) proposed the optimal subsampling method based on the A-optimality criterion. This method was further developed by Wang (2019a), Ai et al. (2021a), Ai et al. (2021b), Wang and Ma (2021), and Yu et al. (2022). In addition to the widely used method based on inverse probability weighting, Wang and Kim (2022) introduced the maximum sample conditional likelihood estimation, and enhanced the estimator for selected subsamples. This approach overcomes the limitations of inverse probability weighting and makes more efficient use of sample information. Yao and ** (2024) introduced a perturbation subsampling method that employs repeated random weighting of known distributions to address the limitations of inverse probability weighting. This method has been successfully applied to linear models, longitudinal data, and high-dimensional data with promising performance.

For the measurement error model, Fuller (1987) systematically introduced a comprehensive statistical inference of linear regression models with measurement errors. Nakamura (1990) proposed the corrected score method for the generalized linear model with measurement errors. Carroll et al. (2006) systematically studied the theory of nonlinear regression models with measurement errors. Liang, Hardle and Carroll (1999) offered the parameter estimation of a semi-parametric partially linear model with measurement errors. Liang and Li (2009) examined variable selection in partially linear models with measurement errors, and proposed that when the variance of the measurement error is unknown, it can be estimated through repeated observations. Lee, Wang and Schifano (2020) introduced an online update method for correcting measurement errors in big data streams.

This study focuses on the subsampling problem in linear models with measurement errors. The presence of measurement errors in covariates can introduce inaccuracies of parameter estimation, thereby diminishing the statistical power. We employ the corrected likelihood approach proposed by Nakamura (1990) to estimate the parameters with subsamples. We introduce an optimal subsampling method based on the corrected likelihood approach, and the optimal subsampling probabilities are determined by minimizing the trace of the variance. The consistency and asymptotic normality of estimators obtained by this approach are established. Furthermore, we propose a perturbation subsampling method based on the corrected likelihood approach that approximates the objective function of the full data using a perturbation with independently generated stochastic weights. The effectiveness of our method is also confirmed through numerical analysis. By accounting for measurement errors in covariates, more precise and reliable results can be obtained in the analysis of massive datasets. Our approaches not only alleviate the computational burden associated with parameter estimation in big data but also enhance computational efficiency and improve prediction accuracy.

The rest of the paper is outlined as follows. Section 2 offers a comprehensive introduction to the model setup and parameter estimation of the measurement error model. Sections 3 and 4 introduce the linear model subsampling algorithm with measurement errors and establish the correspondingly theoretical properties. Section 5 comprises numerical simulations. Section 6 presents case studies aimed at validating effectiveness of the algorithm. Finally, Section 7 summarizes this paper. The detailed proofs are provided in the supplementary materials

2 Linear model with measurement errors

In this section, we present an overview of the model and parameter estimation.

2.1 Model

Here, we consider the linear model with measurement errors

{yi=𝐗iT𝜷+ϵi𝐖i=𝐗i+𝐔i,i=1,2,,n,\left\{\begin{aligned} y_{i}=\mathbf{X}_{i}^{T}\boldsymbol{\beta}+\epsilon_{i}% \\ \mathbf{W}_{i}=\mathbf{X}_{i}+\mathbf{U}_{i}\end{aligned}\right.,i=1,2,\ldots,n,{ start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW , italic_i = 1 , 2 , … , italic_n , (2.1)

where 𝐗i=(xi1,,xip)Tsubscript𝐗𝑖superscriptsubscript𝑥𝑖1subscript𝑥𝑖𝑝𝑇\mathbf{X}_{i}=(x_{i1},\ldots,x_{ip})^{T}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is a p𝑝pitalic_p-dimensional covariate vector, yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the corresponding response variable, 𝜷=(β1,β2,,βp)T𝜷superscriptsubscript𝛽1subscript𝛽2subscript𝛽𝑝𝑇\boldsymbol{\beta}=(\beta_{1},\beta_{2},\ldots,\beta_{p})^{T}bold_italic_β = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is an unknown parameter vector, and ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a random error term with mean zero and variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. However, in applications, the exact value of 𝐗isubscript𝐗𝑖\mathbf{X}_{i}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is often difficult to obtain. Let 𝐔i=(ui1,,uip)Tsubscript𝐔𝑖superscriptsubscript𝑢𝑖1subscript𝑢𝑖𝑝𝑇\mathbf{U}_{i}=(u_{i1},\ldots,u_{ip})^{T}bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_i italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT be a random error vector with mean zero and variance-covariance matrix 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT, and 𝐖isubscript𝐖𝑖\mathbf{W}_{i}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the actual observed random variable. Assuming 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT is known, the measurement error 𝐔isubscript𝐔𝑖\mathbf{U}_{i}bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is independent of ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐗isubscript𝐗𝑖\mathbf{X}_{i}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

According to Fuller (1987), the ordinary least squares method cannot be directly applied to estimate linear models with measurement errors, as the resulting estimators are biased and inconsistent. For model (2.1), it follows that

yi=𝐖iT𝜷+ϵi𝐔iT𝜷𝐖iT𝜷+δi,i=1,2,,n,formulae-sequencesubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇𝜷subscriptitalic-ϵ𝑖superscriptsubscript𝐔𝑖𝑇𝜷superscriptsubscript𝐖𝑖𝑇𝜷subscript𝛿𝑖𝑖12𝑛y_{i}=\mathbf{W}_{i}^{T}\boldsymbol{\beta}+\epsilon_{i}-\mathbf{U}_{i}^{T}% \boldsymbol{\beta}\triangleq\mathbf{W}_{i}^{T}\boldsymbol{\beta}+\delta_{i},~{% }i=1,2,\ldots,n,italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β ≜ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , 2 , … , italic_n ,

where δi=ϵi𝐔iT𝜷subscript𝛿𝑖subscriptitalic-ϵ𝑖superscriptsubscript𝐔𝑖𝑇𝜷\delta_{i}=\epsilon_{i}-\mathbf{U}_{i}^{T}\boldsymbol{\beta}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β. Note that

Cov(𝐖i,δi)=Cov(𝐗i+𝐔i,ϵi𝐔iT𝜷)=𝚺uu𝜷0.𝐶𝑜𝑣subscript𝐖𝑖subscript𝛿𝑖𝐶𝑜𝑣subscript𝐗𝑖subscript𝐔𝑖subscriptitalic-ϵ𝑖superscriptsubscript𝐔𝑖𝑇𝜷subscript𝚺𝑢𝑢𝜷0Cov(\mathbf{W}_{i},\delta_{i})=Cov(\mathbf{X}_{i}+\mathbf{U}_{i},\epsilon_{i}-% \mathbf{U}_{i}^{T}\boldsymbol{\beta})=-\boldsymbol{\Sigma}_{uu}\boldsymbol{% \beta}\neq 0.italic_C italic_o italic_v ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_C italic_o italic_v ( bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β ) = - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β ≠ 0 .

Since the assumption of independence is violated, the ordinary method cannot be applied directly. This also suggests that when there are no measurement errors, i.e., 𝚺uu=0subscript𝚺𝑢𝑢0\boldsymbol{\Sigma}_{uu}=0bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT = 0, the estimator is unbiased. Example 1 illustrates the influences of measurement errors.

Example 1.

The responses yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are generated from yi=0.5+0.5xi+εisubscript𝑦𝑖0.50.5subscript𝑥𝑖subscript𝜀𝑖y_{i}=0.5+0.5x_{i}+\varepsilon_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0.5 + 0.5 italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, xiN(0,1)similar-tosubscript𝑥𝑖𝑁01x_{i}\sim N(0,1)italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N ( 0 , 1 ) and εiN(0,1)similar-tosubscript𝜀𝑖𝑁01\varepsilon_{i}\sim N(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N ( 0 , 1 ), i=1,,1000𝑖11000i=1,\ldots,1000italic_i = 1 , … , 1000. We consider the measurement error of the covariate xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by replacing wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with wi=xi+uisubscript𝑤𝑖subscript𝑥𝑖subscript𝑢𝑖w_{i}=x_{i}+u_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, uiN(0,0.52)similar-tosubscript𝑢𝑖𝑁0superscript0.52u_{i}\sim N(0,0.5^{2})italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N ( 0 , 0.5 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). The black solid line represents true model. The blue dotted line represents the regression line fitted by the full dataset, while the red dashed line represents the regression line fitted by a subsample with size 50 selected using L-optimal subsampling. From Figure 1, we can see that the presence of measurement errors in covariates have great effect on the resulting subsample estimator, if measurement errors are ignored.

Refer to caption
Refer to caption
Figure 1: The plot on the left is the fitting result of the data without measurement errors. The plot on the right is the fitting result of the data with measurement errors, while ignoring measurement errors.

2.2 Parameter estimation

To correct measurement errors, we apply the corrected likelihood method proposed by Nakamura (1990). When 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT is known, using the corrected likelihood method, we obtain

(𝜷)=12ni=1n(yi𝐖i𝜷)212𝜷T𝚺uu𝜷.𝜷12𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖subscript𝐖𝑖𝜷212superscript𝜷𝑇subscript𝚺𝑢𝑢𝜷\ell(\boldsymbol{\beta})=\frac{1}{2n}\sum_{i=1}^{n}(y_{i}-\mathbf{W}_{i}% \boldsymbol{\beta})^{2}-\frac{1}{2}\boldsymbol{\beta}^{T}\boldsymbol{\Sigma}_{% uu}\boldsymbol{\beta}.roman_ℓ ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_β ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β .

Minimizing (𝜷)𝜷\ell(\boldsymbol{\beta})roman_ℓ ( bold_italic_β ), we have

𝜷^=argmin𝜷(𝜷)=(i=1n𝐖i𝐖iTn𝚺uu)1i=1n𝐖iyi.^𝜷𝜷𝜷superscriptsuperscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇𝑛subscript𝚺𝑢𝑢1superscriptsubscript𝑖1𝑛subscript𝐖𝑖subscript𝑦𝑖\hat{\boldsymbol{\beta}}=\arg\underset{\boldsymbol{\beta}}{\min}\ell(% \boldsymbol{\beta})=\left(\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-n% \boldsymbol{\Sigma}_{uu}\right)^{-1}\sum_{i=1}^{n}\mathbf{W}_{i}y_{i}.over^ start_ARG bold_italic_β end_ARG = roman_arg underbold_italic_β start_ARG roman_min end_ARG roman_ℓ ( bold_italic_β ) = ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_n bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

According to Liang, Hardle and Carroll (1999), under certain assumptions, the parameter estimators possess both consistency and asymptotic normality.

Lemma 1.

Suppose that there exists an s>2𝑠2s>2italic_s > 2 such that E𝐗i2s<𝐸superscriptnormsubscript𝐗𝑖2𝑠E\|\mathbf{X}_{i}\|^{2s}<\inftyitalic_E ∥ bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 italic_s end_POSTSUPERSCRIPT < ∞, E𝐔i2s<𝐸superscriptnormsubscript𝐔𝑖2𝑠E\|\mathbf{U}_{i}\|^{2s}<\inftyitalic_E ∥ bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 italic_s end_POSTSUPERSCRIPT < ∞ and E|ϵi|2s<𝐸superscriptsubscriptitalic-ϵ𝑖2𝑠E|\epsilon_{i}|^{2s}<\inftyitalic_E | italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 italic_s end_POSTSUPERSCRIPT < ∞, where \|\cdot\|∥ ⋅ ∥ represents an Euclidean norm. Additionally, 1ni=1n𝐗i𝐗iT1𝑛superscriptsubscript𝑖1𝑛subscript𝐗𝑖superscriptsubscript𝐗𝑖𝑇\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{T}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT converges to the covariance matrix \mathcal{H}caligraphic_H, where \mathcal{H}caligraphic_H is a non-random positive definite matrix. Then it follows that
(i)normal-i(\mathrm{i})( roman_i ) 𝛃^𝛃=OP(n12)normnormal-^𝛃𝛃subscript𝑂𝑃superscript𝑛12\|\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\|=O_{P}(n^{-\frac{1}{2}})∥ over^ start_ARG bold_italic_β end_ARG - bold_italic_β ∥ = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ) as nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞.
(ii)normal-ii(\mathrm{ii})( roman_ii ) 𝛃^normal-^𝛃\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG follows an asymptotic normal distribution, i.e.,

n(𝜷^𝜷)𝑑Np(𝟎,1Γ1),𝑑𝑛^𝜷𝜷subscript𝑁𝑝0superscript1Γsuperscript1\sqrt{n}(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})\xrightarrow{d}N_{p}(% \mathbf{0},\mathcal{H}^{-1}\Gamma\mathcal{H}^{-1}),square-root start_ARG italic_n end_ARG ( over^ start_ARG bold_italic_β end_ARG - bold_italic_β ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , caligraphic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Γ caligraphic_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ,

as nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, where Γ=σ2+𝛃T𝚺uu𝛃+E[(𝐔𝐔T𝚺uu)𝛃]2+σ2𝚺uu,normal-Γsuperscript𝜎2superscript𝛃𝑇subscript𝚺𝑢𝑢𝛃𝐸superscriptdelimited-[]superscript𝐔𝐔𝑇subscript𝚺𝑢𝑢𝛃tensor-productabsent2superscript𝜎2subscript𝚺𝑢𝑢\Gamma=\sigma^{2}\mathcal{H}+\mathcal{H}\boldsymbol{\beta}^{T}\boldsymbol{% \Sigma}_{uu}\boldsymbol{\beta}+E[(\mathbf{U}\mathbf{U}^{T}-\boldsymbol{\Sigma}% _{uu})\boldsymbol{\beta}]^{\otimes 2}+\sigma^{2}\boldsymbol{\Sigma}_{uu},roman_Γ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_H + caligraphic_H bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β + italic_E [ ( bold_UU start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) bold_italic_β ] start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT , and 𝐯2=𝐯𝐯Tsuperscript𝐯tensor-productabsent2𝐯superscript𝐯𝑇\boldsymbol{v}^{\otimes 2}=\boldsymbol{vv}^{T}bold_italic_v start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT = bold_italic_v bold_italic_v start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for any 𝐯𝐯\boldsymbol{v}bold_italic_v.

Remark 1.

In applications, 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT is often unknown. Liang and Li (2009) proposed that 𝚺^uusubscript^𝚺𝑢𝑢\hat{\boldsymbol{\Sigma}}_{uu}over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT can be obtained by utilizing repeated observations corresponding to each 𝐗isubscript𝐗𝑖\mathbf{X}_{i}bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For repetitive observations 𝐖i,j=𝐗i+𝐔i,j,j=1,,Ji,i=1,,nformulae-sequencesubscript𝐖𝑖𝑗subscript𝐗𝑖subscript𝐔𝑖𝑗formulae-sequence𝑗1subscript𝐽𝑖𝑖1𝑛\mathbf{W}_{i,j}=\mathbf{X}_{i}+\mathbf{U}_{i,j},j=1,\ldots,J_{i},i=1,\ldots,nbold_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_U start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_n, we have

𝚺^uu=i=1nj=1Ji(𝐖i,j𝐖¯i)(𝐖i,j𝐖¯i)Ti=1n(Ji1),subscript^𝚺𝑢𝑢superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1subscript𝐽𝑖subscript𝐖𝑖𝑗subscript¯𝐖𝑖superscriptsubscript𝐖𝑖𝑗subscript¯𝐖𝑖𝑇superscriptsubscript𝑖1𝑛subscript𝐽𝑖1\widehat{\boldsymbol{\Sigma}}_{uu}=\frac{\sum_{i=1}^{n}\sum_{j=1}^{J_{i}}(% \mathbf{W}_{i,j}-\overline{\mathbf{W}}_{i})(\mathbf{W}_{i,j}-\overline{\mathbf% {W}}_{i})^{T}}{\sum_{i=1}^{n}(J_{i}-1)},over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT = divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( bold_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT - over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) end_ARG ,

where 𝐖¯i=Ji1j=1Ji𝐖i,jsubscript¯𝐖𝑖superscriptsubscript𝐽𝑖1superscriptsubscript𝑗1subscript𝐽𝑖subscript𝐖𝑖𝑗\overline{\mathbf{W}}_{i}=J_{i}^{-1}\sum_{j=1}^{J_{i}}\mathbf{W}_{i,j}over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT. By substituting 𝐖isubscript𝐖𝑖\mathbf{W}_{i}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by 𝐖¯isubscript¯𝐖𝑖\overline{\mathbf{W}}_{i}over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the estimator (i=1n𝐖¯i𝐖¯iTnJi1𝚺^uu)1i=1n𝐖¯iyisuperscriptsuperscriptsubscript𝑖1𝑛subscript¯𝐖𝑖superscriptsubscript¯𝐖𝑖𝑇𝑛superscriptsubscript𝐽𝑖1subscript^𝚺𝑢𝑢1superscriptsubscript𝑖1𝑛subscript¯𝐖𝑖subscript𝑦𝑖(\sum_{i=1}^{n}\overline{\mathbf{W}}_{i}\overline{\mathbf{W}}_{i}^{T}-nJ_{i}^{% -1}\widehat{\boldsymbol{\Sigma}}_{uu})^{-1}\sum_{i=1}^{n}\overline{\mathbf{W}}% _{i}y_{i}( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - italic_n italic_J start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT over¯ start_ARG bold_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is unbiased, and the aforementioned properties remain valid.

3 Optimal subsampling based on corrected likelihood

In this section, we will present a general subsampling algorithm that can be used to obtain estimators with measurement error models. We will establish the consistency and asymptotic normality of these estimators and calculate the optimal subsampling probability using the A (or L) - optimality criterion.

3.1 General subsampling algorithm

Define some symbols before introducing the algorithm. We take a random subsample from the full data with replacement based on the subsampling probability πi,i=1,,nformulae-sequencesubscript𝜋𝑖𝑖1𝑛\pi_{i},i=1,\ldots,nitalic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_n, where i=1nπi=1superscriptsubscript𝑖1𝑛subscript𝜋𝑖1\sum_{i=1}^{n}\pi_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. In addition, we denote the subsample as r={(𝐖i*,yi*)}i=1rsubscript𝑟superscriptsubscriptsuperscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖𝑖1𝑟\mathcal{F}_{r}=\{(\mathbf{W}_{i}^{*},y_{i}^{*})\}_{i=1}^{r}caligraphic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = { ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT with r𝑟ritalic_r being the subsample size. The corresponding subsampling probability is denoted as πi*,i=1,,rformulae-sequencesuperscriptsubscript𝜋𝑖𝑖1𝑟\pi_{i}^{*},i=1,\ldots,ritalic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_i = 1 , … , italic_r. We can formulate the weighted loss function as follows

*(𝜷)=12ni=1r1rπi*(yi*𝐖i*T𝜷)212𝜷T𝚺uu𝜷.superscript𝜷12𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖superscriptsuperscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇𝜷212superscript𝜷𝑇subscript𝚺𝑢𝑢𝜷\ell^{*}(\boldsymbol{\beta})=\frac{1}{2n}\sum_{i=1}^{r}\frac{1}{r\pi_{i}^{*}}(% y_{i}^{*}-\mathbf{W}_{i}^{*T}\boldsymbol{\beta})^{2}-\frac{1}{2}\boldsymbol{% \beta}^{T}\boldsymbol{\Sigma}_{uu}\boldsymbol{\beta}.roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT bold_italic_β ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β .
Algorithm 1 General subsampling algorithm based on corrected likelihood
  • 1

    Subsample: Extract a random subsample r={(𝐖i*,yi*)}i=1rsubscript𝑟superscriptsubscriptsuperscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖𝑖1𝑟\mathcal{F}_{r}=\{(\mathbf{W}_{i}^{*},y_{i}^{*})\}_{i=1}^{r}caligraphic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = { ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT of size r𝑟ritalic_r (rnmuch-less-than𝑟𝑛r\ll nitalic_r ≪ italic_n) from the full dataset with replacement based on the subsampling probabilities {πi}i=1nsuperscriptsubscriptsubscript𝜋𝑖𝑖1𝑛\{\pi_{i}\}_{i=1}^{n}{ italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The subsampling probabilities corresponding to the subsample are denoted as {πi*}i=1rsuperscriptsubscriptsuperscriptsubscript𝜋𝑖𝑖1𝑟\{\pi_{i}^{*}\}_{i=1}^{r}{ italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT.

  • 2

    Estimate: Minimize the objective function *(𝜷)superscript𝜷\ell^{*}(\boldsymbol{\beta})roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) based on the subsample rsubscript𝑟\mathcal{F}_{r}caligraphic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT to obtain the parameter estimator, that is,

    𝜷~=argmin𝜷*(𝜷).~𝜷𝜷superscript𝜷\tilde{\boldsymbol{\beta}}=\arg\underset{\boldsymbol{\beta}}{\min}\ell^{*}(% \boldsymbol{\beta}).over~ start_ARG bold_italic_β end_ARG = roman_arg underbold_italic_β start_ARG roman_min end_ARG roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) .

Algorithm 1 is a general subsampling algorithm that relies on the corrected likelihood to address big data subsampling problems with measurement errors.

3.2 Asymptotic properties

In order to obtain the asymptotic properties of 𝜷~~𝜷\tilde{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG, the following assumptions are given.

Assumption 1.

The information matrix W=1ni=1n𝐖i𝐖iT𝚺uusubscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT converges to a positive definite matrix in probability.

Assumption 2.

1ni=1n𝐖ik=OP(1)1𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖𝑘subscript𝑂𝑃1\frac{1}{n}\sum_{i=1}^{n}\|\mathbf{W}_{i}\|^{k}=O_{P}(1)divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) and 1ni=1n(yi𝐖iT𝜷^)k=OP(1),k=2,4.formulae-sequence1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷𝑘subscript𝑂𝑃1𝑘24\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{k% }=O_{P}(1),k=2,4.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) , italic_k = 2 , 4 .

Assumption 3.

maxi=1,,n(nπi)1=OP(1).\max\limits_{i=1,\ldots,n}(n\pi_{i})^{-1}=O_{P}(1).roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT ( italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) .

Assumption 4.

There exists δ>0𝛿0\delta>0italic_δ > 0 such that 1ni=1n(yi𝐖iT𝜷^)2+δ𝐖i2+δ=OP(1).1𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝛿superscriptnormsubscript𝐖𝑖2𝛿subscript𝑂𝑃1\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2% +\delta}\|\mathbf{W}_{i}\|^{2+\delta}=O_{P}(1).divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) .

Assumption 1 indicates that E(𝐗𝐗T)𝐸superscript𝐗𝐗𝑇E(\mathbf{X}\mathbf{X}^{T})italic_E ( bold_XX start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is positive definite and consistent with Assumption 1 in Wang, Zhu and Ma (2018). Assumption 2 requires the limited moment, and ensures the consistency of parameter estimator, where 𝐖ik=(j=1pwij2)k/2superscriptnormsubscript𝐖𝑖𝑘superscriptsuperscriptsubscript𝑗1𝑝superscriptsubscript𝑤𝑖𝑗2𝑘2\|\mathbf{W}_{i}\|^{k}=(\sum_{j=1}^{p}w_{ij}^{2})^{k/2}∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_k / 2 end_POSTSUPERSCRIPT. Assumption 3 restricts the weights in *(𝜷)superscript𝜷\ell^{*}(\boldsymbol{\beta})roman_ℓ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) in order to protect the estimating equation from being dominated by data points with minimal subsampling probabilities. Assumption 4 introduces a finite moment, which ensures the asymptotic normality of the parameter estimator.

Theorem 1.

Under Assumptions 1-3, as rnormal-→𝑟r\rightarrow\inftyitalic_r → ∞ and nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, 𝛃~normal-~𝛃\tilde{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG converges to 𝛃^normal-^𝛃\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the convergence rate is r𝑟\sqrt{r}square-root start_ARG italic_r end_ARG. That is, with probability approaching one, for any ε>0𝜀0\varepsilon>0italic_ε > 0, there exist constants Δεsubscriptnormal-Δ𝜀\Delta_{\varepsilon}roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT and rεsubscript𝑟𝜀r_{\varepsilon}italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT such that

P(𝜷~𝜷^r1/2Δε|n)<ε𝑃norm~𝜷^𝜷conditionalsuperscript𝑟12subscriptΔ𝜀subscript𝑛𝜀P(\|\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}\|\geq r^{-1/2}\Delta_{% \varepsilon}|\mathcal{F}_{n})<\varepsilonitalic_P ( ∥ over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) < italic_ε

for all r>rε𝑟subscript𝑟𝜀r>r_{\varepsilon}italic_r > italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT.

Remark 2.

If a sequence of random variables is bounded in conditional probability, it will also be bounded in unconditional probability. Therefore, Theorem 1 implies that 𝜷~𝜷^=OP(r1/2)norm~𝜷^𝜷subscript𝑂𝑃superscript𝑟12\|\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}\|=O_{P}(r^{-1/2})∥ over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ∥ = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ).

Theorem 2.

Under Assumptions 1-4, if r=o(n)𝑟𝑜𝑛r=o(n)italic_r = italic_o ( italic_n ), then as rnormal-→𝑟r\rightarrow\inftyitalic_r → ∞ and nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, 𝛃~𝛃^normal-~𝛃normal-^𝛃\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG converges to a normal distribution in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, that is,

V1/2(𝜷~𝜷^)𝑑Np(𝟎,I),𝑑superscript𝑉12~𝜷^𝜷subscript𝑁𝑝0𝐼V^{-1/2}(\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}})\xrightarrow{d}N_% {p}(\mathbf{0},I),italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) ,

where V=W1VcW1,𝑉superscriptsubscript𝑊1subscript𝑉𝑐superscriptsubscript𝑊1V=\mathcal{H}_{W}^{-1}V_{c}\mathcal{H}_{W}^{-1},italic_V = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , W=1ni=1n𝐖i𝐖iT𝚺uu,subscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu},caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT , and

Vc=1rn2i=1n(yi𝐖iT𝜷^)2𝐖i𝐖iTπi1r(𝚺uu𝜷^)2.subscript𝑉𝑐1𝑟superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝜋𝑖1𝑟superscriptsubscript𝚺𝑢𝑢^𝜷tensor-productabsent2V_{c}=\frac{1}{rn^{2}}\sum_{i=1}^{n}\frac{(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{2}\mathbf{W}_{i}\mathbf{W}_{i}^{T}}{\pi_{i}}-\frac{1}{r}% (\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})^{\otimes 2}.italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT .
Remark 3.

Theorem 2 shows that the second term of Vcsubscript𝑉𝑐V_{c}italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is independent of the sampling probabilities when nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is given. Hence it can be disregarded when we compute the optimal subsampling probabilities.

3.3 Optimal subsampling algorithm

To determine the optimal subsampling probabilities, we utilize the A (or L) - optimality criterion from optimal design of experiments. This criterion aims to minimize the asymptotic mean square error of 𝜷~~𝜷\tilde{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG (or W𝜷~subscript𝑊~𝜷\mathcal{H}_{W}\tilde{\boldsymbol{\beta}}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT over~ start_ARG bold_italic_β end_ARG ). Given that 𝜷~~𝜷\tilde{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG (or W𝜷~subscript𝑊~𝜷\mathcal{H}_{W}\tilde{\boldsymbol{\beta}}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT over~ start_ARG bold_italic_β end_ARG ) is asymptotically unbiased, minimizing the asymptotic variance V𝑉Vitalic_V (or Vcsubscript𝑉𝑐V_{c}italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT) is sufficient.

Theorem 3.

(i)i(\emph{i})( i ) In Algorithm 1, if the subsampling probabilities πi,i=1,,nformulae-sequencesubscript𝜋𝑖𝑖1normal-…𝑛\pi_{i},i=1,\ldots,nitalic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_n are selected as

πimV=|yi𝐖iT𝜷^|W1𝐖ii=1n|yi𝐖iT𝜷^|W1𝐖i,i=1,,n,formulae-sequencesuperscriptsubscript𝜋𝑖𝑚𝑉subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsuperscriptsubscript𝑊1subscript𝐖𝑖superscriptsubscript𝑖1𝑛subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsuperscriptsubscript𝑊1subscript𝐖𝑖𝑖1𝑛\pi_{i}^{mV}=\frac{|y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}}|\|% \mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|}{\sum_{i=1}^{n}|y_{i}-\mathbf{W}_{i}^{T}% \hat{\boldsymbol{\beta}}|\|\mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|},i=1,\ldots,n,italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_V end_POSTSUPERSCRIPT = divide start_ARG | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG , italic_i = 1 , … , italic_n , (3.2)

then the asymptotic variance tr(V)𝑡𝑟𝑉tr(V)italic_t italic_r ( italic_V ) of 𝛃~normal-~𝛃\tilde{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG attains its minimum.
(ii)ii(\emph{ii})( ii ) In Algorithm 1, if the subsampling probabilities πi,i=1,,nformulae-sequencesubscript𝜋𝑖𝑖1normal-…𝑛\pi_{i},i=1,\ldots,nitalic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_n are selected as

πimVc=|yi𝐖iT𝜷^|𝐖ii=1n|yi𝐖iT𝜷^|𝐖i,i=1,,n,formulae-sequencesuperscriptsubscript𝜋𝑖𝑚subscript𝑉𝑐subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsubscript𝐖𝑖superscriptsubscript𝑖1𝑛subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsubscript𝐖𝑖𝑖1𝑛\pi_{i}^{mV_{c}}=\frac{|y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}}|\|% \mathbf{W}_{i}\|}{\sum_{i=1}^{n}|y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{% \beta}}|\|\mathbf{W}_{i}\|},i=1,\ldots,n,italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = divide start_ARG | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ end_ARG , italic_i = 1 , … , italic_n , (3.3)

then the asymptotic variance tr(Vc)𝑡𝑟subscript𝑉𝑐tr(V_{c})italic_t italic_r ( italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) of W𝛃~subscript𝑊normal-~𝛃\mathcal{H}_{W}\tilde{\boldsymbol{\beta}}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT over~ start_ARG bold_italic_β end_ARG attains its minimum.

In (3.2) and (3.3), we observe that the optimal subsampling probabilities with measurement errors are similar to those of the generalized linear model developed by Ai et al. (2021b). However, our A-optimal subsampling probability includes a correction term 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT for Wsubscript𝑊\mathcal{H}_{W}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT. Notably, the computing time required to determine W1𝐖inormsuperscriptsubscript𝑊1subscript𝐖𝑖\|\mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ has a complexity of O(np2)𝑂𝑛superscript𝑝2O(np^{2})italic_O ( italic_n italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). While the L-optimal subsampling method demands only O(np)𝑂𝑛𝑝O(np)italic_O ( italic_n italic_p ) time to calculate 𝐖inormsubscript𝐖𝑖\|\mathbf{W}_{i}\|∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥. Due to the presence of 𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG in the optimal sampling probability, we adopt the two-step algorithm, which is summarized in Algorithm 2.

Algorithm 2 Optimal subsampling algorithm based on corrected likelihood
  • 1

    Apply the uniform subsampling probability 1/n1𝑛{1}/{n}1 / italic_n to Algorithm 1 with subsample size r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to obtain a pilot estimate 𝜷~0subscript~𝜷0\tilde{\boldsymbol{\beta}}_{0}over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT of 𝜷𝜷\boldsymbol{\beta}bold_italic_β. Replace 𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG with 𝜷~0subscript~𝜷0\tilde{\boldsymbol{\beta}}_{0}over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to obtain the optimal subsampling probabilities {πimV}i=1nsuperscriptsubscriptsubscriptsuperscript𝜋𝑚𝑉𝑖𝑖1𝑛\{\pi^{mV}_{i}\}_{i=1}^{n}{ italic_π start_POSTSUPERSCRIPT italic_m italic_V end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT or {πimVc}i=1n.superscriptsubscriptsubscriptsuperscript𝜋𝑚subscript𝑉𝑐𝑖𝑖1𝑛\{\pi^{mV_{c}}_{i}\}_{i=1}^{n}.{ italic_π start_POSTSUPERSCRIPT italic_m italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

  • 2

    Take the optimal subsampling probabilities into Algorithm 1 with subsample size r𝑟ritalic_r. Combine the r0+rsubscript𝑟0𝑟r_{0}+ritalic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_r samples obtained from the two steps, and obtain 𝜷˘˘𝜷\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG.

Theorem 4.

Under Assumptions 1-3, as r0r1/20,rformulae-sequencenormal-→subscript𝑟0superscript𝑟120normal-→𝑟r_{0}r^{-1/2}\rightarrow 0,r\rightarrow\inftyitalic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT → 0 , italic_r → ∞ and nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, if 𝛃~0subscriptnormal-~𝛃0\tilde{\boldsymbol{\beta}}_{0}over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT exists, then 𝛃˘normal-˘𝛃\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG converges to 𝛃^normal-^𝛃\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the convergence rate is r𝑟\sqrt{r}square-root start_ARG italic_r end_ARG. That is, with probability approaching one, for any ε>0𝜀0\varepsilon>0italic_ε > 0, there exist constants Δεsubscriptnormal-Δ𝜀\Delta_{\varepsilon}roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT and rεsubscript𝑟𝜀r_{\varepsilon}italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT such that

P(𝜷˘𝜷^r1/2Δε|n)<ε𝑃norm˘𝜷^𝜷conditionalsuperscript𝑟12subscriptΔ𝜀subscript𝑛𝜀P(\|\breve{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}\|\geq r^{-1/2}\Delta_{% \varepsilon}|\mathcal{F}_{n})<\varepsilonitalic_P ( ∥ over˘ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) < italic_ε

for all r>rε𝑟subscript𝑟𝜀r>r_{\varepsilon}italic_r > italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT, where 𝛃˘normal-˘𝛃\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG is obtained by Algorithm 2.

Theorem 5.

Under Assumptions 1-4, if r=o(n)𝑟𝑜𝑛r=o(n)italic_r = italic_o ( italic_n ), then as r0r1/20normal-→subscript𝑟0superscript𝑟120r_{0}r^{-1/2}\rightarrow 0italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT → 0 and rnormal-→𝑟r\rightarrow\inftyitalic_r → ∞ , 𝛃˘𝛃^normal-˘𝛃normal-^𝛃\breve{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG converges to a normal distribution in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, that is,

(V𝜷~0)1/2(𝜷˘𝜷^)𝑑Np(𝟎,I),𝑑superscriptsuperscript𝑉subscript~𝜷012˘𝜷^𝜷subscript𝑁𝑝0𝐼(V^{\tilde{\boldsymbol{\beta}}_{0}})^{-1/2}(\breve{\boldsymbol{\beta}}-\hat{% \boldsymbol{\beta}})\xrightarrow{d}N_{p}(\mathbf{0},I),( italic_V start_POSTSUPERSCRIPT over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over˘ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) ,

where 𝛃˘normal-˘𝛃\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG is obtained by Algorithm 2, V𝛃~0=W1Vc𝛃~0W1,superscript𝑉subscriptnormal-~𝛃0superscriptsubscript𝑊1superscriptsubscript𝑉𝑐subscriptnormal-~𝛃0superscriptsubscript𝑊1V^{\tilde{\boldsymbol{\beta}}_{0}}=\mathcal{H}_{W}^{-1}V_{c}^{\tilde{% \boldsymbol{\beta}}_{0}}\mathcal{H}_{W}^{-1},italic_V start_POSTSUPERSCRIPT over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , W=1ni=1n𝐖i𝐖iT𝚺uusubscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT and

Vc𝜷~0=1rn2i=1n(yi𝐖iT𝜷^)2𝐖i𝐖iTπi(𝜷~0)1r(𝚺uu𝜷^)2.superscriptsubscript𝑉𝑐subscript~𝜷01𝑟superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝜋𝑖subscript~𝜷01𝑟superscriptsubscript𝚺𝑢𝑢^𝜷tensor-productabsent2V_{c}^{\tilde{\boldsymbol{\beta}}_{0}}=\frac{1}{rn^{2}}\sum_{i=1}^{n}\frac{(y_% {i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}\mathbf{W}_{i}\mathbf{W}_{i% }^{T}}{\pi_{i}(\tilde{\boldsymbol{\beta}}_{0})}-\frac{1}{r}(\boldsymbol{\Sigma% }_{uu}\hat{\boldsymbol{\beta}})^{\otimes 2}.italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT .

The standard error of an estimator has significant importance for statistical inference, such as hypothesis test and constructing confidence intervals. However, the computation of asymptotic covariance matrices necessitates utilizing the full data, which can be highly resource-intensive given the substantial sample size. To reduce computational costs, we employ the subsample to approximate the covariance matrix of 𝜷˘˘𝜷\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG. This approximation is denoted as

V˘=˘W1V˘c˘W1,˘𝑉superscriptsubscript˘𝑊1subscript˘𝑉𝑐superscriptsubscript˘𝑊1\breve{V}=\breve{\mathcal{H}}_{W}^{-1}\breve{V}_{c}\breve{\mathcal{H}}_{W}^{-1},over˘ start_ARG italic_V end_ARG = over˘ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˘ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT over˘ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ,

where ˘W=1n(r0+r)i=1r+r0𝐖i*𝐖i*T𝚺uu,subscript˘𝑊1𝑛subscript𝑟0𝑟superscriptsubscript𝑖1𝑟subscript𝑟0superscriptsubscript𝐖𝑖superscriptsubscript𝐖𝑖absent𝑇subscript𝚺𝑢𝑢\breve{\mathcal{H}}_{W}=\frac{1}{n(r_{0}+r)}\sum_{i=1}^{r+r_{0}}\mathbf{W}_{i}% ^{*}\mathbf{W}_{i}^{*T}-\boldsymbol{\Sigma}_{uu},over˘ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n ( italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_r ) end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r + italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT , and

V˘c=1(r+r0)2n2i=1r+r0(yi*𝐖i*T𝜷˘)2𝐖i*𝐖i*Tπi*1r+r0(𝚺uu𝜷˘)2.subscript˘𝑉𝑐1superscript𝑟subscript𝑟02superscript𝑛2superscriptsubscript𝑖1𝑟subscript𝑟0superscriptsuperscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇˘𝜷2superscriptsubscript𝐖𝑖superscriptsubscript𝐖𝑖absent𝑇superscriptsubscript𝜋𝑖1𝑟subscript𝑟0superscriptsubscript𝚺𝑢𝑢˘𝜷tensor-productabsent2\breve{V}_{c}=\frac{1}{(r+r_{0})^{2}n^{2}}\sum_{i=1}^{r+r_{0}}\frac{(y_{i}^{*}% -\mathbf{W}_{i}^{*T}\breve{\boldsymbol{\beta}})^{2}\mathbf{W}_{i}^{*}\mathbf{W% }_{i}^{*T}}{\pi_{i}^{*}}-\frac{1}{r+r_{0}}(\boldsymbol{\Sigma}_{uu}\breve{% \boldsymbol{\beta}})^{\otimes 2}.over˘ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ( italic_r + italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r + italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over˘ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_r + italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over˘ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT .

Here, ˘Wsubscript˘𝑊\breve{\mathcal{H}}_{W}over˘ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT and V˘csubscript˘𝑉𝑐\breve{V}_{c}over˘ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represent moment estimations for Wsubscript𝑊\mathcal{H}_{W}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT and Vcsubscript𝑉𝑐V_{c}italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, respectively. If 𝜷˘˘𝜷\breve{\boldsymbol{\beta}}over˘ start_ARG bold_italic_β end_ARG is substituted with 𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG, these estimates become unbiased.

4 Perturbation subsampling based on corrected likelihood

The optimal subsampling algorithm mentioned in Section 3 requires the calculation of unequal sampling probabilities for the full data at once. However, as the sample size increases, implementation becomes increasingly memory-intensive. Therefore, this section introduces the perturbation subsampling algorithm to address this issue.

4.1 Perturbation subsampling algorithm

Suppose that {μi}i=1nsuperscriptsubscriptsubscript𝜇𝑖𝑖1𝑛\{\mu_{i}\}_{i=1}^{n}{ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are generated from a Bernoulli distribution with probability qn=r/nsubscript𝑞𝑛𝑟𝑛q_{n}=r/nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_r / italic_n, {νi}i=1nsuperscriptsubscriptsubscript𝜈𝑖𝑖1𝑛\{\nu_{i}\}_{i=1}^{n}{ italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are generated from a known probability distribution with mean 1/qn1subscript𝑞𝑛1/q_{n}1 / italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The weighted loss function can be written as

L*(𝜷)=12ni=1nψi(yi𝐖iT𝜷)212𝜷T𝚺uu𝜷,superscript𝐿𝜷12𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇𝜷212superscript𝜷𝑇subscript𝚺𝑢𝑢𝜷L^{*}(\boldsymbol{\beta})=\frac{1}{2n}\sum_{i=1}^{n}\psi_{i}(y_{i}-\mathbf{W}_% {i}^{T}\boldsymbol{\beta})^{2}-\frac{1}{2}\boldsymbol{\beta}^{T}\boldsymbol{% \Sigma}_{uu}\boldsymbol{\beta},italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_β start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β ,

where ψi=μiνisubscript𝜓𝑖subscript𝜇𝑖subscript𝜈𝑖\psi_{i}=\mu_{i}\nu_{i}italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Algorithm 3 Perturbation subsampling algorithm based on corrected likelihood
  • 1

    Sampling: Generate n𝑛nitalic_n i.i.d. random variables {μk,i}i=1nsuperscriptsubscriptsubscript𝜇𝑘𝑖𝑖1𝑛\{\mu_{k,i}\}_{i=1}^{n}{ italic_μ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that μk,iBernoulli(qn)similar-tosubscript𝜇𝑘𝑖𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖subscript𝑞𝑛\mu_{k,i}\sim Bernoulli(q_{n})italic_μ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∼ italic_B italic_e italic_r italic_n italic_o italic_u italic_l italic_l italic_i ( italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), where qn=r/nsubscript𝑞𝑛𝑟𝑛q_{n}=r/nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_r / italic_n.

  • 2

    Random weighting: Generate n𝑛nitalic_n i.i.d. random variables {νk,i}i=1nsuperscriptsubscriptsubscript𝜈𝑘𝑖𝑖1𝑛\{\nu_{k,i}\}_{i=1}^{n}{ italic_ν start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT from a completely known distribution with E(νk,i)=1/qn𝐸subscript𝜈𝑘𝑖1subscript𝑞𝑛E(\nu_{k,i})=1/q_{n}italic_E ( italic_ν start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ) = 1 / italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and variance being bn2superscriptsubscript𝑏𝑛2b_{n}^{2}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • 3

    Estimation: Minimize L*(𝜷)superscript𝐿𝜷L^{*}(\boldsymbol{\beta})italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) to obtain parameter estimator 𝜷ˇk=argmin𝜷L*(𝜷).subscriptˇ𝜷𝑘𝜷superscript𝐿𝜷\check{\boldsymbol{\beta}}_{k}=\arg\underset{\boldsymbol{\beta}}{\min}L^{*}(% \boldsymbol{\beta}).overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_arg underbold_italic_β start_ARG roman_min end_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) .

  • 4

    Combination Repeat steps 1-3 m𝑚mitalic_m times and then combine the resulting estimates, namely 𝜷ˇ(m)=1mk=1m𝜷ˇk.superscriptˇ𝜷𝑚1𝑚superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘\check{\boldsymbol{\beta}}^{(m)}=\frac{1}{m}\sum_{k=1}^{m}\check{\boldsymbol{% \beta}}_{k}.overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

The conditional variance of 𝜷ˇ(m)superscriptˇ𝜷𝑚\check{\boldsymbol{\beta}}^{(m)}overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT can be estimated as

Var^(𝜷ˇ(m)|n)=1m(m1)k=1m(𝜷ˇk𝜷ˇ(m))(𝜷ˇk𝜷ˇ(m))T.^𝑉𝑎𝑟conditionalsuperscriptˇ𝜷𝑚subscript𝑛1𝑚𝑚1superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘superscriptˇ𝜷𝑚superscriptsubscriptˇ𝜷𝑘superscriptˇ𝜷𝑚𝑇\widehat{Var}(\check{\boldsymbol{\beta}}^{(m)}|\mathcal{F}_{n})=\frac{1}{m(m-1% )}\sum_{k=1}^{m}(\check{\boldsymbol{\beta}}_{k}-\check{\boldsymbol{\beta}}^{(m% )})(\check{\boldsymbol{\beta}}_{k}-\check{\boldsymbol{\beta}}^{(m)})^{T}.over^ start_ARG italic_V italic_a italic_r end_ARG ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_m ( italic_m - 1 ) end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT .

4.2 Asymptotic properties

To obtain the asymptotic properties of 𝜷ˇ(m)superscriptˇ𝜷𝑚\check{\boldsymbol{\beta}}^{(m)}overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT , the following assumption is given.

Assumption 5.

lim supnqnE(ψ2)<subscriptlimit-supremum𝑛subscript𝑞𝑛𝐸superscript𝜓2\limsup\limits_{n\rightarrow\infty}q_{n}E(\psi^{2})<\inftylim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_E ( italic_ψ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) < ∞, and there exists α>0𝛼0\alpha>0italic_α > 0 such that lim supnqn2+αEν2+α<subscriptlimit-supremum𝑛superscriptsubscript𝑞𝑛2𝛼𝐸superscript𝜈2𝛼\limsup\limits_{n\rightarrow\infty}q_{n}^{2+\alpha}E\nu^{2+\alpha}<\inftylim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT italic_E italic_ν start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT < ∞.

In Assumption 5, finite second-order moment and higher-order moment are assumed, It is equivalent to that there exists α>0𝛼0\alpha>0italic_α > 0 such that lim supnqn1+αEψ2+α<subscriptlimit-supremum𝑛superscriptsubscript𝑞𝑛1𝛼𝐸superscript𝜓2𝛼\limsup\limits_{n\rightarrow\infty}q_{n}^{1+\alpha}E\psi^{2+\alpha}<\inftylim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_α end_POSTSUPERSCRIPT italic_E italic_ψ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT < ∞.

Theorem 6.

Under Assumptions 1, 2, 5, as rnormal-→𝑟r\rightarrow\inftyitalic_r → ∞ and nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, then 𝛃ˇ(m)superscriptnormal-ˇ𝛃𝑚\check{\boldsymbol{\beta}}^{(m)}overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT converges to 𝛃^normal-^𝛃\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the convergence rate is mr𝑚𝑟\sqrt{mr}square-root start_ARG italic_m italic_r end_ARG. That is, with probability approaching one, for any ε>0𝜀0\varepsilon>0italic_ε > 0, there exist constants Δεsubscriptnormal-Δ𝜀\Delta_{\varepsilon}roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT and rεsubscript𝑟𝜀r_{\varepsilon}italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT such that

P(𝜷ˇ(m)𝜷^(mr)1/2Δε|n)<ε𝑃normsuperscriptˇ𝜷𝑚^𝜷conditionalsuperscript𝑚𝑟12subscriptΔ𝜀subscript𝑛𝜀P(\|\check{\boldsymbol{\beta}}^{(m)}-\hat{\boldsymbol{\beta}}\|\geq(mr)^{-1/2}% \Delta_{\varepsilon}|\mathcal{F}_{n})<\varepsilonitalic_P ( ∥ overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG ∥ ≥ ( italic_m italic_r ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT roman_Δ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) < italic_ε

for all r>rε𝑟subscript𝑟𝜀r>r_{\varepsilon}italic_r > italic_r start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT.

Remark 4.

According to Theorem 6, 𝜷ˇ(m)superscriptˇ𝜷𝑚\check{\boldsymbol{\beta}}^{(m)}overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT is the consistent estimator of 𝜷^^𝜷\hat{\boldsymbol{\beta}}over^ start_ARG bold_italic_β end_ARG. When rm<n𝑟𝑚𝑛rm<nitalic_r italic_m < italic_n, the convergence rate is rm𝑟𝑚\sqrt{rm}square-root start_ARG italic_r italic_m end_ARG, otherwise it is n𝑛\sqrt{n}square-root start_ARG italic_n end_ARG. Hence, the estimation based on the full data is still more effective than that using repeat perturbation subsampling.

Theorem 7.

Under Assumptions 1, 2, 4, 5, if r=o(n)𝑟𝑜𝑛r=o(n)italic_r = italic_o ( italic_n ), then as rnormal-→𝑟r\rightarrow\inftyitalic_r → ∞ and nnormal-→𝑛n\rightarrow\inftyitalic_n → ∞, 𝛃ˇ(m)𝛃^superscriptnormal-ˇ𝛃𝑚normal-^𝛃\check{\boldsymbol{\beta}}^{(m)}-\hat{\boldsymbol{\beta}}overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG converges to a normal distribution in conditional probability given nsubscript𝑛\mathcal{F}_{n}caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, that is,

Σ1/2mr/an(𝜷ˇ(m)𝜷^)𝑑Np(𝟎,I),r,n,formulae-sequence𝑑superscriptΣ12𝑚𝑟subscript𝑎𝑛superscriptˇ𝜷𝑚^𝜷subscript𝑁𝑝0𝐼formulae-sequence𝑟𝑛\Sigma^{-1/2}\sqrt{mr/a_{n}}(\check{\boldsymbol{\beta}}^{(m)}-\hat{\boldsymbol% {\beta}})\xrightarrow{d}N_{p}(\mathbf{0},I),r\rightarrow\infty,n\rightarrow\infty,roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT square-root start_ARG italic_m italic_r / italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) , italic_r → ∞ , italic_n → ∞ ,

where an=1qn+bn2qn2subscript𝑎𝑛1subscript𝑞𝑛superscriptsubscript𝑏𝑛2superscriptsubscript𝑞𝑛2a_{n}=1-q_{n}+b_{n}^{2}q_{n}^{2}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1 - italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, Σ=W1ΣcW1,normal-Σsuperscriptsubscript𝑊1subscriptnormal-Σ𝑐superscriptsubscript𝑊1\Sigma=\mathcal{H}_{W}^{-1}\Sigma_{c}\mathcal{H}_{W}^{-1},roman_Σ = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , W=1ni=1n𝐖i𝐖iT𝚺uu,subscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu},caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT , and Σc=1ni=1n𝐖i𝐖iT(yi𝐖iT𝛃^)2.subscriptnormal-Σ𝑐1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇normal-^𝛃2\Sigma_{c}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}.roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

5 Simulation studies

We generate the full data from model (2.1) with n=10000𝑛10000n=10000italic_n = 10000, 𝜷=(1,1,1,1,1)T𝜷superscript11111𝑇\boldsymbol{\beta}=(1,1,1,1,1)^{T}bold_italic_β = ( 1 , 1 , 1 , 1 , 1 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and ϵiN(0,σϵ2)similar-tosubscriptitalic-ϵ𝑖𝑁0superscriptsubscript𝜎italic-ϵ2\epsilon_{i}\sim N(0,\sigma_{\epsilon}^{2})italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Let 𝐗iN5(𝟎,𝚺)similar-tosubscript𝐗𝑖subscript𝑁50𝚺\mathbf{X}_{i}\sim N_{5}(\mathbf{0},\boldsymbol{\Sigma})bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( bold_0 , bold_Σ ), where 𝚺j,k=0.5|jk|subscript𝚺𝑗𝑘superscript0.5𝑗𝑘\boldsymbol{\Sigma}_{j,k}=0.5^{|j-k|}bold_Σ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = 0.5 start_POSTSUPERSCRIPT | italic_j - italic_k | end_POSTSUPERSCRIPT, for j,k=1,2,,5formulae-sequence𝑗𝑘125j,k=1,2,\ldots,5italic_j , italic_k = 1 , 2 , … , 5, and 𝐔iN5(𝟎,σu2I)similar-tosubscript𝐔𝑖subscript𝑁50superscriptsubscript𝜎𝑢2𝐼\mathbf{U}_{i}\sim N_{5}(\mathbf{0},\sigma_{u}^{2}I)bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_N start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT ( bold_0 , italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ), then 𝐖i=𝐗i+𝐔isubscript𝐖𝑖subscript𝐗𝑖subscript𝐔𝑖\mathbf{W}_{i}=\mathbf{X}_{i}+\mathbf{U}_{i}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We consider the following three values for σu2superscriptsubscript𝜎𝑢2\sigma_{u}^{2}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σϵ2superscriptsubscript𝜎italic-ϵ2\sigma_{\epsilon}^{2}italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, respectively: σu2=0.6,0.4,0.2superscriptsubscript𝜎𝑢20.60.40.2\sigma_{u}^{2}=0.6,0.4,0.2italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.6 , 0.4 , 0.2; σϵ2=1.44,1,0.64superscriptsubscript𝜎italic-ϵ21.4410.64\sigma_{\epsilon}^{2}=1.44,1,0.64italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1.44 , 1 , 0.64.

In Algorithm 3, let m=10𝑚10m=10italic_m = 10, and assume that the known distribution with random weighting follows an exponential distribution with mean 1/qn1subscript𝑞𝑛1/q_{n}1 / italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, i.e., νiExp(qn)similar-tosubscript𝜈𝑖Expsubscript𝑞𝑛\nu_{i}\sim\mathrm{Exp}(q_{n})italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Exp ( italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Correspondingly, bn2=1/qn2superscriptsubscript𝑏𝑛21superscriptsubscript𝑞𝑛2b_{n}^{2}=1/q_{n}^{2}italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 / italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and an=1qn+bn2qn2=2qnsubscript𝑎𝑛1subscript𝑞𝑛superscriptsubscript𝑏𝑛2superscriptsubscript𝑞𝑛22subscript𝑞𝑛a_{n}=1-q_{n}+b_{n}^{2}q_{n}^{2}=2-q_{n}italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 1 - italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2 - italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. We choose r0=100subscript𝑟0100r_{0}=100italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 100 and r=200,400,600,800,1000𝑟2004006008001000r=200,400,600,800,1000italic_r = 200 , 400 , 600 , 800 , 1000. For each value of r𝑟ritalic_r, we perform N=1000𝑁1000N=1000italic_N = 1000 repetitions to calculate the mean squared error (MSE): 1Ni=1N𝜷^i𝜷21𝑁superscriptsubscript𝑖1𝑁superscriptnormsubscript^𝜷𝑖𝜷2\frac{1}{N}\sum_{i=1}^{N}\|\hat{\boldsymbol{\beta}}_{i}-\boldsymbol{\beta}\|^{2}divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ over^ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_italic_β ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. For comparison, we consider six subsampling methods including the perturbation subsampling based on corrected likelihood in Algorithm 3 (CLEPS), A (or L)-optimal subsampling based on corrected likelihood in Algorithm 2 (A-opt and L-opt), uniform subsampling (UNIF), leverage subsampling (BLEV), and D-optimal subsampling (IBOSS). Measurement errors are not considered for UNIF, BLEV and IBOSS. To ensure equity, all methods except for A-opt and L-opt use r0+rsubscript𝑟0𝑟r_{0}+ritalic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_r subsamples for parameter estimation.

Refer to caption
Figure 2: The MSEs based on different σu2superscriptsubscript𝜎𝑢2\sigma_{u}^{2}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σϵ2superscriptsubscript𝜎italic-ϵ2\sigma_{\epsilon}^{2}italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for different r𝑟ritalic_r.

The results presented in Figure 2 demonstrate the superior performance of CLEPS compared to other methods, closely followed by the A-opt and L-opt. Furthermore, as the subsample size increases, the MSEs of our proposed methods approach to zero, while other approaches that do not account for measurement errors show relatively stable MSEs. These results confirm the inconsistency of ordinary least squares estimator for linear models with measurement errors. Additionally, decreasing σu2superscriptsubscript𝜎𝑢2\sigma_{u}^{2}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and σϵ2superscriptsubscript𝜎italic-ϵ2\sigma_{\epsilon}^{2}italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT lead to a reductions for MSEs obtained by all six methods. Notably, even for big variance of the random error, our methods maintain good performance.

To further investigate the impact of other parameters on the sampling method, Figure 3 offers the variations in MSE across distinct values of r0,n,p,subscript𝑟0𝑛𝑝r_{0},n,p,italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_n , italic_p , and m𝑚mitalic_m while kee** σϵ2=1superscriptsubscript𝜎italic-ϵ21\sigma_{\epsilon}^{2}=1italic_σ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 and σu2=0.4superscriptsubscript𝜎𝑢20.4\sigma_{u}^{2}=0.4italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.4. Additionally, Table 1 presents the results about different m𝑚mitalic_m for CLEPS.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: The plots except the bottom right present the MSEs for different r0,n,psubscript𝑟0𝑛𝑝r_{0},n,pitalic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_n , italic_p, respectively. The bottom right plot presents the MSEs for different r𝑟ritalic_r with m=1𝑚1m=1italic_m = 1.

In Figure 3, the top left plot shows that as r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ascends, the performance of A-opt and L-opt initially enhances and subsequently declines for fixed r0+r=1000subscript𝑟0𝑟1000r_{0}+r=1000italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_r = 1000. This reason is that the inaccurate estimation is obtained in the first step when r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is too small. The top right plot indicates that as the sample size increases, the CLEPS performs more efficiently, while other methods remain almost unchanged. The bottom left plot displays that as the dimension p𝑝pitalic_p increases, the MSEs of various methods rises. When p>15𝑝15p>15italic_p > 15, A-opt and L-opt underperform compared to other methods, whereas the CLEPS consistently demonstrates the best performance. The bottom right plot shows that when m=1𝑚1m=1italic_m = 1, the MSE of CLEPS is slightly bigger than the A-opt and L-opt.

Table 1: The MSEs for different m𝑚mitalic_m.
r𝑟ritalic_r m𝑚mitalic_m log10MSEsubscript10MSE\log_{10}\mathrm{MSE}roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT roman_MSE m=1𝑚1m=1italic_m = 1 m=10𝑚10m=10italic_m = 10 m=20𝑚20m=20italic_m = 20 m=30𝑚30m=30italic_m = 30 m=50𝑚50m=50italic_m = 50
200 --0.775 --1.669 --1.880 --1.978 --2.085
400 --1.003 --1.864 --2.038 --2.128 --2.198
600 --1.156 --1.958 --2.111 --2.173 --2.244
800 --1.236 --2.034 --2.158 --2.212 --2.269
1000 --1.336 --2.075 --2.197 --2.244 --2.285

In Table 1, it is observed that as m𝑚mitalic_m increases, the MSE decreases. However, the rate of reduction in MSE progressively diminishes. This suggests that m𝑚mitalic_m should be significantly smaller than r𝑟ritalic_r in order to achieve effective inference. It is advisable to set m<r/10𝑚𝑟10m<r/10italic_m < italic_r / 10, in accordance with the findings in Shang and Cheng (2017), Wang (2019b), and Wang and Ma (2021), which suggest that the number of partitions should be significantly smaller than the sample size within each data partition.

6 Real examples

6.1 Diamond price dataset

The diamond price dataset is an integrated dataset containing prices and other characteristics of approximately 54,000 diamonds. This dataset can be accessed at https://www.kaggle.com/datasets/shivam2503/diamonds. Our aim is to explore the relationship between diamond prices (y𝑦yitalic_y) and three covariates: carat (x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT), depth (x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) and table (x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT). All variables are standardized and the linear model is

y=x1β1+x2β2+x3β3+ϵ.𝑦subscript𝑥1subscript𝛽1subscript𝑥2subscript𝛽2subscript𝑥3subscript𝛽3italic-ϵy=x_{1}\beta_{1}+x_{2}\beta_{2}+x_{3}\beta_{3}+\epsilon.italic_y = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_ϵ .

Following the literature, the measurement error model is 𝐖=𝐗+𝐔𝐖𝐗𝐔\mathbf{W}=\mathbf{X}+\mathbf{U}bold_W = bold_X + bold_U, where 𝐗=(x1,x2,x3)T𝐗superscriptsubscript𝑥1subscript𝑥2subscript𝑥3𝑇\mathbf{X}=(x_{1},x_{2},x_{3})^{T}bold_X = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, 𝐔𝐔\mathbf{U}bold_U is the measurement error vector with mean zero and covariance matrix σu2I3superscriptsubscript𝜎𝑢2subscript𝐼3\sigma_{u}^{2}I_{3}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, σu2=0.6,0.4,0.2superscriptsubscript𝜎𝑢20.60.40.2\sigma_{u}^{2}=0.6,0.4,0.2italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0.6 , 0.4 , 0.2. Figure 4 depicts MSEs for different σu2superscriptsubscript𝜎𝑢2\sigma_{u}^{2}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and r𝑟ritalic_r values when m=10,r0=200,N=500formulae-sequence𝑚10formulae-sequencesubscript𝑟0200𝑁500m=10,r_{0}=200,N=500italic_m = 10 , italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 200 , italic_N = 500. As the sample size increases, the MSEs of the proposed methods decrease and become smaller than those of other methods. Moreover, with an increase in the variance of measurement error, there is a corresponding decrease in the MSEs of the proposed methods.

Refer to caption
Figure 4: The MSEs for different σu2superscriptsubscript𝜎𝑢2\sigma_{u}^{2}italic_σ start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and r𝑟ritalic_r.

6.2 Airline delay dataset

The airline delay dataset has nearly 120 million records, which can be found in https://community.amstat.org/jointscsg-section/dataexpo/dataexpo2009. It includes detailed information on the arrival and departure of all commercial flights within the America from 1987 to 2008. This study focuses solely on 2008 data, yielding a total of 2389217 samples and 29 variables. After cleaning, 715731 observation data points were obtained. We select Arrival Delay as the response variable, with Departure Delay, Distance, Air Time, and Elapsed Time as the covariates. It’s worth noting that the flight elapsed time incorporates two types of time: Actual Elapsed Time and CRS Elapsed Time. Among them, the CRS Elapsed Time refers to the original elapsed time. Therefore, the Flight Elapsed Time is regarded as a variable with measurement errors of two repetitions, while other variables have no measurement errors. The estimate for the covariance matrix of measurement error is

[0000.0195].matrix0missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression0missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression0missing-subexpressionmissing-subexpressionmissing-subexpressionmissing-subexpression0.0195\begin{bmatrix}0&&&\\ &0&&\\ &&0&\\ &&&0.0195\end{bmatrix}.[ start_ARG start_ROW start_CELL 0 end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL 0 end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL 0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL end_CELL start_CELL 0.0195 end_CELL end_ROW end_ARG ] .

In our proposed methods, the Flight Elapsed Time is calculated as the mean of the two variables, whereas in other methods, it is determined as the Actual Elapsed Time.

Figure 5 displays the impact of varying r𝑟ritalic_r on MSE when m=10,r0=200formulae-sequence𝑚10subscript𝑟0200m=10,r_{0}=200italic_m = 10 , italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 200, and N=500𝑁500N=500italic_N = 500. Notably, the MSEs of the proposed methods consistently decrease and outperform other methods, indicating that our methods also have good properties in applications when 𝚺uusubscript𝚺𝑢𝑢\boldsymbol{\Sigma}_{uu}bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT is unknown.

Refer to caption
Figure 5: The MSEs for different r𝑟ritalic_r.

7 Summary

For large-scale data with measurement errors in covariates of linear models, this paper studies two subsampling methods based on the corrected likelihood. Theoretical results and numerical studies demonstrate that the two algorithms proposed in this paper outperform other existed sampling methods when measurement errors are present. However, this work only considers linear models. Future research directions could include investigating nonlinear models, high-dimensional data, or distributed data.

Supplementary Materials

Theorems 1-7 are proved in supplementary materials.

References

  • Ai et al. (2021a) Ai, M., Wang, F., Yu, J. and Zhang, H. (2021a). Optimal subsampling for large-scale quantile regression. Journal of Complexity 62, 101512.
  • Ai et al. (2021b) Ai, M., Yu, J., Zhang, H. and Wang, H. (2021b). Optimal subsampling algorithms for big data regressions. Statistica Sinica 31(2), 749–772.
  • Carroll et al. (2006) Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective. 2nd Edition. Chapman and Hall/CRC, New York.
  • Cheng, Wang and Yang (2020) Cheng, Q., Wang, H. and Yang, M. (2020). Information-based optimal subdata selection for big data logistic regression. Journal of Statistical Planning and Inference 209, 112–122.
  • Fuller (1987) Fuller, W. A. (1987). Measurement Error Models. Wiley, New York.
  • Lee, Wang and Schifano (2020) Lee, J., Wang, H. and Schifano, E. D. (2020). Online updating method to correct for measurement error in big data streams. Computational Statistics & Data Analysis 149, 106976.
  • Liang, Hardle and Carroll (1999) Liang, H., Hardle, W. and Carroll, R. J. (1999). Estimation in a semiparametric partially linear errors-in-variables model. The Annals of Statistics 27(5), 1519–1535.
  • Liang and Li (2009) Liang, H. and Li, R.(2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association 104(485), 234–248.
  • Ma, Mahoney and Yu (2015) Ma, P., Mahoney, M. W. and Yu, B.(2015). A statistical perspective on algorithmic leveraging. Journal of Machine Learning Research 16, 861–911.
  • Nakamura (1990) Nakamura, T. (1990). Corrected score function for errors-in-variables models: methodology and application to generalized linear models. Biometrika 77, 127–137.
  • Shang and Cheng (2017) Shang, Z. and Cheng, G (2017). Computational limits of a distributed algorithm for smoothing spline. Journal of Machine Learning Research 18(108), 1–37.
  • Wang, Zhu and Ma (2018) Wang, H., Zhu, R. and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association 113(522), 829–844.
  • Wang (2019a) Wang, H. (2019a). More efficient estimation for logistic regression with optimal subsamples. Journal of Machine Learning Research 20, 1–59.
  • Wang (2019b) Wang, H. (2019b). Divide-and-conquer information-based optimal subdata selection algorithm. Journal of Statistical Theory and Practice 13(3), 46.
  • Wang, Yang and Stufken (2019) Wang, H., Yang, M. and Stufken, J. (2019). Information-based optimal subdata selection for big data linear regression. Journal of the American Statistical Association 114(525), 393–405.
  • Wang and Ma (2021) Wang, H. and Ma, Y. (2021). Optimal subsampling for quantile regression in big data. Biometrika 108(1), 99–112.
  • Wang and Kim (2022) Wang, H. and Kim, J. K. (2022). Maximum sampled conditional likelihood for informative subsampling. Journal of Machine Learning Research 23(1), 14937–14986.
  • Wang et al. (2021) Wang, L., Elmstedt, J., Wong, W. and Xu, H. (2021). Orthogonal subsampling for big data linear regression. Annals of Applied Statistics 15(3), 1273–1290.
  • Yao and ** (2024) Yao, Y. and **, Z. (2024). A perturbation subsampling for large scale data. Statistica Sinica, DOI:10.5705/ss.202022.0020.
  • Yi and Zhou (2023) Yi, S. and Zhou, Y. (2023). Model-free global likelihood subsampling for massive data. Statistics and Computing 33, 9.
  • Yu and Wang (2022) Yu, J. and Wang, H. (2022). Subdata selection algorithm for linear model discrimination. Statistical Papers 63(6), 1883–1906.
  • Yu et al. (2022) Yu, J. Wang, H., Ai, M. and Zhang, H. (2022). Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. Journal of the American Statistical Association 117(537), 265–276.
  • Yu, Liu and Wang (2023) Yu, J., Liu, J. and Wang, H. (2023). Information-based optimal subdata selection for non-linear models. Statistical Papers 64, 1069–1093.
  • Zhang et al. (2024) Zhang, M., Zhou, Y., Zhou, Z. and Zhang, A. (2024). Model-free subsampling method based on uniform designs. IEEE Transactions on Knowledge and Data Engineering 36(3), 1210–1220.

School of Statistics and Data Science, Qufu Normal University E-mail: [email protected]

School of Statistics and Data Science, Qufu Normal University E-mail: [email protected]

School of Statistics and Data Science, Qufu Normal University E-mail: [email protected]

SUBSAMPLING FOR BIG DATA LINEAR MODELS

WITH MEASUREMENT ERRORS

School of Statistics and Data Science, Qufu Normal University

Supplementary Material

The supplementary material contains proofs for Theorems 1-7.

S1 Lemma

We first state a technical lemma.

Lemma 2 (Crsubscript𝐶𝑟C_{r}italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT inequality).

Let {Xi,i1}subscript𝑋𝑖𝑖1\{X_{i},i\geq 1\}{ italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ≥ 1 } be a sequence of independent random variables, then the k𝑘kitalic_k-th moment of the sum of random variables is not greater than the sum of the k𝑘kitalic_k-th moments of the random variables. i.e.,

E|i=1nXi|kCri=1nE|Xi|k,𝐸superscriptsuperscriptsubscript𝑖1𝑛subscript𝑋𝑖𝑘subscript𝐶𝑟superscriptsubscript𝑖1𝑛𝐸superscriptsubscript𝑋𝑖𝑘E\Big{|}\sum_{i=1}^{n}X_{i}\Big{|}^{k}\leq C_{r}\sum_{i=1}^{n}E|X_{i}|^{k},italic_E | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ≤ italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E | italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ,

where Cr={1,0<k1,nk1,k>1.C_{r}=\left\{\begin{aligned} &1,&&0<k\leq 1,\\ &n^{k-1},&&k>1.\end{aligned}\right.italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = { start_ROW start_CELL end_CELL start_CELL 1 , end_CELL start_CELL end_CELL start_CELL 0 < italic_k ≤ 1 , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_n start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , end_CELL start_CELL end_CELL start_CELL italic_k > 1 . end_CELL end_ROW

S2 Proof of Theorem 1

Note that

𝜷~𝜷^~𝜷^𝜷\displaystyle\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG =(1ni=1r1rπi*𝐖i*𝐖i*T𝚺uu)1[1ni=1r1rπi*𝐖i*(yi*𝐖i*T𝜷^)+𝚺uu𝜷^]absentsuperscript1𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖superscriptsubscript𝐖𝑖superscriptsubscript𝐖𝑖absent𝑇subscript𝚺𝑢𝑢1delimited-[]1𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\displaystyle=(\frac{1}{n}\sum_{i=1}^{r}\frac{1}{r\pi_{i}^{*}}\mathbf{W}_{i}^{% *}\mathbf{W}_{i}^{*T}-\boldsymbol{\Sigma}_{uu})^{-1}\cdot[\frac{1}{n}\sum_{i=1% }^{r}\frac{1}{r\pi_{i}^{*}}\mathbf{W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}% \hat{\boldsymbol{\beta}})+\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}]= ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) + bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ] (S2.1)
=(~W)1˙*(𝜷^),absentsuperscriptsubscript~𝑊1superscript˙^𝜷\displaystyle=-(\tilde{\mathcal{H}}_{W})^{-1}\dot{\ell}^{*}(\hat{\boldsymbol{% \beta}}),= - ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ,

where ~W=1ri=1r[1nπi*𝐖i*𝐖i*T𝚺uu],˙*(𝜷)=1ni=1r1rπi*[𝐖i*(yi*𝐖i*T𝜷)]𝚺uu𝜷.formulae-sequencesubscript~𝑊1𝑟superscriptsubscript𝑖1𝑟delimited-[]1𝑛superscriptsubscript𝜋𝑖superscriptsubscript𝐖𝑖superscriptsubscript𝐖𝑖absent𝑇subscript𝚺𝑢𝑢superscript˙𝜷1𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇𝜷subscript𝚺𝑢𝑢𝜷\tilde{\mathcal{H}}_{W}=\frac{1}{r}\sum_{i=1}^{r}[\frac{1}{n\pi_{i}^{*}}% \mathbf{W}_{i}^{*}\mathbf{W}_{i}^{*T}-\boldsymbol{\Sigma}_{uu}],\dot{\ell}^{*}% (\boldsymbol{\beta})=\frac{1}{n}\sum_{i=1}^{r}\frac{1}{r\pi_{i}^{*}}[-\mathbf{% W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\boldsymbol{\beta})]-\boldsymbol{% \Sigma}_{uu}\boldsymbol{\beta}.over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ] , over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT bold_italic_β ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β . Therefore, we only need to prove

˙*(𝜷^)=OP|n(r1/2),superscript˙^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})=O_{P|\mathcal{F}_{n}}(r^{-1/2}),over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) , (S2.2)

and

~WW=OP|n(r1/2),subscript~𝑊subscript𝑊subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\tilde{\mathcal{H}}_{W}-\mathcal{H}_{W}=O_{P|\mathcal{F}_{n}}(r^{-1/2}),over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) , (S2.3)

where W=1ni=1n𝐖i𝐖iT𝚺uu.subscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu}.caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT .

To prove (S2.2), we directly calculate

E(˙*(𝜷^)|n)𝐸conditionalsuperscript˙^𝜷subscript𝑛\displaystyle E(\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})|\mathcal{F}_{n})italic_E ( over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =E{1ni=1r1rπi*[𝐖i*(yi*𝐖i*T𝜷^)]𝚺uu𝜷^|n}absent𝐸conditional-set1𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝚺𝑢𝑢^𝜷subscript𝑛\displaystyle=E\left\{\frac{1}{n}\sum_{i=1}^{r}\frac{1}{r\pi_{i}^{*}}[-\mathbf% {W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]-% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\bigg{|}\mathcal{F}_{n}\right\}= italic_E { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=1nri=1rE{1πi*[𝐖i*(yi*𝐖i*T𝜷^)]|n}𝚺uu𝜷^absent1𝑛𝑟superscriptsubscript𝑖1𝑟𝐸conditional-set1superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝑛subscript𝚺𝑢𝑢^𝜷\displaystyle=\frac{1}{nr}\sum_{i=1}^{r}E\left\{\frac{1}{\pi_{i}^{*}}[-\mathbf% {W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]\Big{|}% \mathcal{F}_{n}\right\}-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}= divide start_ARG 1 end_ARG start_ARG italic_n italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_E { divide start_ARG 1 end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG
=1ni=1nπi1πi[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^absent1𝑛superscriptsubscript𝑖1𝑛subscript𝜋𝑖1subscript𝜋𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\pi_{i}\cdot\frac{1}{\pi_{i}}[-\mathbf{% W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_% {uu}\hat{\boldsymbol{\beta}}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG
=1ni=1n[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^absent1𝑛superscriptsubscript𝑖1𝑛delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\displaystyle=\frac{1}{n}\sum_{i=1}^{n}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{% T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG
=𝟎.absent0\displaystyle=\mathbf{0}.= bold_0 .

For the j𝑗jitalic_j-th element ˙j*(𝜷^)superscriptsubscript˙𝑗^𝜷\dot{\ell}_{j}^{*}(\hat{\boldsymbol{\beta}})over˙ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) of ˙*(𝜷^)superscript˙^𝜷\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) where 1jp1𝑗𝑝1\leq j\leq p1 ≤ italic_j ≤ italic_p,

Var(˙j*(𝜷^)|n)=𝑉𝑎𝑟conditionalsuperscriptsubscript˙𝑗^𝜷subscript𝑛absent\displaystyle Var(\dot{\ell}_{j}^{*}(\hat{\boldsymbol{\beta}})|\mathcal{F}_{n})=italic_V italic_a italic_r ( over˙ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = E{1ni=1r1rπi*[wij*(yi*𝐖i*T𝜷^)](𝚺uu𝜷^)j|n}2𝐸superscriptconditional-set1𝑛superscriptsubscript𝑖1𝑟1𝑟superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝑤𝑖𝑗superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscriptsubscript𝚺𝑢𝑢^𝜷𝑗subscript𝑛2\displaystyle E\left\{\frac{1}{n}\sum_{i=1}^{r}\frac{1}{r\pi_{i}^{*}}[-w_{ij}^% {*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]-(\boldsymbol{% \Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}\bigg{|}\mathcal{F}_{n}\right\}^{2}italic_E { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_r italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== E{1ri=1r{1nπi*[wij*(yi*𝐖i*T𝜷^)](𝚺uu𝜷^)j}|n}2𝐸superscriptconditional-set1𝑟superscriptsubscript𝑖1𝑟1𝑛superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝑤𝑖𝑗superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscriptsubscript𝚺𝑢𝑢^𝜷𝑗subscript𝑛2\displaystyle E\left\{\frac{1}{r}\sum_{i=1}^{r}\left\{\frac{1}{n\pi_{i}^{*}}[-% w_{ij}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]-(% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}\right\}\bigg{|}\mathcal{% F}_{n}\right\}^{2}italic_E { divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1r2i=1rE{1nπi*[wij*(yi*𝐖i*T𝜷^)](𝚺uu𝜷^)j|n}21superscript𝑟2superscriptsubscript𝑖1𝑟𝐸superscriptconditional-set1𝑛superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝑤𝑖𝑗superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscriptsubscript𝚺𝑢𝑢^𝜷𝑗subscript𝑛2\displaystyle\frac{1}{r^{2}}\sum_{i=1}^{r}E\left\{\frac{1}{n\pi_{i}^{*}}[-w_{% ij}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]-(\boldsymbol{% \Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}\Big{|}\mathcal{F}_{n}\right\}^{2}divide start_ARG 1 end_ARG start_ARG italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_E { divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1r{E{1nπi*[wij*(yi*𝐖i*T𝜷^)]|n}2+E[(𝚺uu𝜷^)j|n]2\displaystyle\frac{1}{r}\Bigg{\{}E\left\{\frac{1}{n\pi_{i}^{*}}[-w_{ij}^{*}(y_% {i}^{*}-\mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]\Big{|}\mathcal{F}_{n}% \right\}^{2}+E[(\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}|\mathcal% {F}_{n}]^{2}divide start_ARG 1 end_ARG start_ARG italic_r end_ARG { italic_E { divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_E [ ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2E[1nπi*[wij*(yi*𝐖i*T𝜷^)](𝚺uu𝜷^)j|n]}\displaystyle-2E\left[\frac{1}{n\pi_{i}^{*}}[-w_{ij}^{*}(y_{i}^{*}-\mathbf{W}_% {i}^{*T}\hat{\boldsymbol{\beta}})](\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{% \beta}})_{j}\Big{|}\mathcal{F}_{n}\right]\Bigg{\}}- 2 italic_E [ divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] }
=\displaystyle== 1n2ri=1nπi{1πi[wij(yi𝐖iT𝜷^)]}21r(𝚺uu𝜷^)j21superscript𝑛2𝑟superscriptsubscript𝑖1𝑛subscript𝜋𝑖superscript1subscript𝜋𝑖delimited-[]subscript𝑤𝑖𝑗subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷21𝑟superscriptsubscriptsubscript𝚺𝑢𝑢^𝜷𝑗2\displaystyle\frac{1}{n^{2}r}\sum_{i=1}^{n}\pi_{i}\cdot\left\{\frac{1}{\pi_{i}% }[-w_{ij}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]\right\}^{2}-\frac% {1}{r}(\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}^{2}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ { divide start_ARG 1 end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1n2ri=1nwij2(yi𝐖iT𝜷^)2πi1r(𝚺uu𝜷^)j21superscript𝑛2𝑟superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖𝑗2superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝜋𝑖1𝑟superscriptsubscriptsubscript𝚺𝑢𝑢^𝜷𝑗2\displaystyle\frac{1}{n^{2}r}\sum_{i=1}^{n}\frac{w_{ij}^{2}(y_{i}-\mathbf{W}_{% i}^{T}\hat{\boldsymbol{\beta}})^{2}}{\pi_{i}}-\frac{1}{r}(\boldsymbol{\Sigma}_% {uu}\hat{\boldsymbol{\beta}})_{j}^{2}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle\leq 1rmaxi=1,,n(nπi)1i=1n𝐖i2(yi𝐖iT𝜷^)2n1r(𝚺uu𝜷^)j2.\displaystyle\frac{1}{r}\max_{i=1,\ldots,n}(n\pi_{i})^{-1}\sum_{i=1}^{n}\frac{% \|\mathbf{W}_{i}\|^{2}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}}{% n}-\frac{1}{r}(\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})_{j}^{2}.divide start_ARG 1 end_ARG start_ARG italic_r end_ARG roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT ( italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

By Assumption 2 and Holder inequality, we can achieve

i=1n𝐖i2(yi𝐖iT𝜷^)2n(i=1n𝐖i4n)1/2(i=1n(yi𝐖iT𝜷^)4n)1/2=OP(1).superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖2superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝑛superscriptsuperscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖4𝑛12superscriptsuperscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷4𝑛12subscript𝑂𝑃1\sum_{i=1}^{n}\frac{\|\mathbf{W}_{i}\|^{2}(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{2}}{n}\leq\left(\sum_{i=1}^{n}\frac{\|\mathbf{W}_{i}\|^{% 4}}{n}\right)^{1/2}\left(\sum_{i=1}^{n}\frac{(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{4}}{n}\right)^{1/2}=O_{P}(1).∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ≤ ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) . (S2.4)

According to Assumption 3, we can infer that Var(˙j*(𝜷^)|n)=1rOP(1)OP(1)OP(r1)=OP(r1)𝑉𝑎𝑟conditionalsuperscriptsubscript˙𝑗^𝜷subscript𝑛1𝑟subscript𝑂𝑃1subscript𝑂𝑃1subscript𝑂𝑃superscript𝑟1subscript𝑂𝑃superscript𝑟1Var(\dot{\ell}_{j}^{*}(\hat{\boldsymbol{\beta}})|\mathcal{F}_{n})=\frac{1}{r}O% _{P}(1)O_{P}(1)-O_{P}(r^{-1})=O_{P}(r^{-1})italic_V italic_a italic_r ( over˙ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) - italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ). From the Chebyshev inequality, for a sufficiently large M𝑀Mitalic_M, we have

P(˙*(𝜷^)r1/2M|n)𝑃normsuperscript˙^𝜷conditionalsuperscript𝑟12𝑀subscript𝑛\displaystyle P(\|\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})\|\geq r^{-1/2}M|% \mathcal{F}_{n})italic_P ( ∥ over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_M | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) rE(˙*(𝜷^)2|n)M2absent𝑟𝐸conditionalsuperscriptnormsuperscript˙^𝜷2subscript𝑛superscript𝑀2\displaystyle\leq\frac{rE(\|\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})\|^{2}|% \mathcal{F}_{n})}{M^{2}}≤ divide start_ARG italic_r italic_E ( ∥ over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=rj=1pE(˙j*(𝜷^)2|n)M2absent𝑟superscriptsubscript𝑗1𝑝𝐸conditionalsuperscriptsubscript˙𝑗superscript^𝜷2subscript𝑛superscript𝑀2\displaystyle=\frac{r\sum_{j=1}^{p}E(\dot{\ell}_{j}^{*}(\hat{\boldsymbol{\beta% }})^{2}|\mathcal{F}_{n})}{M^{2}}= divide start_ARG italic_r ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_E ( over˙ start_ARG roman_ℓ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=OP(1)M20,n,r.formulae-sequenceabsentsubscript𝑂𝑃1superscript𝑀20𝑛𝑟\displaystyle=\frac{O_{P}(1)}{M^{2}}\rightarrow 0,n,r\rightarrow\infty.= divide start_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG → 0 , italic_n , italic_r → ∞ .

Thus, the equation (S2.2) is derived.

In order to prove (S2.3), we directly calculate

E(~W|n)=W.𝐸conditionalsubscript~𝑊subscript𝑛subscript𝑊E(\tilde{\mathcal{H}}_{W}|\mathcal{F}_{n})=\mathcal{H}_{W}.italic_E ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT .

For any element ~Wj1j2,1j1,j2pformulae-sequencesuperscriptsubscript~𝑊subscript𝑗1subscript𝑗21subscript𝑗1subscript𝑗2𝑝\tilde{\mathcal{H}}_{W}^{j_{1}j_{2}},1\leq j_{1},j_{2}\leq pover~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 1 ≤ italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_p of ~Wsubscript~𝑊\tilde{\mathcal{H}}_{W}over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT, using Assumptions 2, 3 and Cauchy-Schwarz inequality, it can be concluded that

Var(~Wj1j2|n)=𝑉𝑎𝑟conditionalsuperscriptsubscript~𝑊subscript𝑗1subscript𝑗2subscript𝑛absent\displaystyle Var(\tilde{\mathcal{H}}_{W}^{j_{1}j_{2}}|\mathcal{F}_{n})=italic_V italic_a italic_r ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = E{~Wj1j2Wj1j2|n}2𝐸superscriptconditional-setsuperscriptsubscript~𝑊subscript𝑗1subscript𝑗2superscriptsubscript𝑊subscript𝑗1subscript𝑗2subscript𝑛2\displaystyle E\{\tilde{\mathcal{H}}_{W}^{j_{1}j_{2}}-\mathcal{H}_{W}^{j_{1}j_% {2}}|\mathcal{F}_{n}\}^{2}italic_E { over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== E{1nri=1r1πi*wij1*wij2*1ni=1nwij1wij2|n}2𝐸superscriptconditional-set1𝑛𝑟superscriptsubscript𝑖1𝑟1superscriptsubscript𝜋𝑖superscriptsubscript𝑤𝑖subscript𝑗1superscriptsubscript𝑤𝑖subscript𝑗21𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗2subscript𝑛2\displaystyle E\left\{\frac{1}{nr}\sum_{i=1}^{r}\frac{1}{\pi_{i}^{*}}w_{ij_{1}% }^{*}w_{ij_{2}}^{*}-\frac{1}{n}\sum_{i=1}^{n}w_{ij_{1}}w_{ij_{2}}\bigg{|}% \mathcal{F}_{n}\right\}^{2}italic_E { divide start_ARG 1 end_ARG start_ARG italic_n italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== E(1nri=1r1πi*wij1*wij2*|n)2(1ni=1nwij1wij2)2𝐸superscriptconditional1𝑛𝑟superscriptsubscript𝑖1𝑟1superscriptsubscript𝜋𝑖superscriptsubscript𝑤𝑖subscript𝑗1superscriptsubscript𝑤𝑖subscript𝑗2subscript𝑛2superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗22\displaystyle E\left(\frac{1}{nr}\sum_{i=1}^{r}\frac{1}{\pi_{i}^{*}}w_{ij_{1}}% ^{*}w_{ij_{2}}^{*}|\mathcal{F}_{n}\right)^{2}-\left(\frac{1}{n}\sum_{i=1}^{n}w% _{ij_{1}}w_{ij_{2}}\right)^{2}italic_E ( divide start_ARG 1 end_ARG start_ARG italic_n italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1n2ri=1nπi(wij1wij2πi)21n2i=1n(wij1wij2)21superscript𝑛2𝑟superscriptsubscript𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗2subscript𝜋𝑖21superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗22\displaystyle\frac{1}{n^{2}r}\sum_{i=1}^{n}\pi_{i}\cdot\left(\frac{w_{ij_{1}}w% _{ij_{2}}}{\pi_{i}}\right)^{2}-\frac{1}{n^{2}}\sum_{i=1}^{n}(w_{ij_{1}}w_{ij_{% 2}})^{2}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ( divide start_ARG italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=\displaystyle== 1n2ri=1n(wij1wij2)2πi1n2i=1n(wij1wij2)21superscript𝑛2𝑟superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗22subscript𝜋𝑖1superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖subscript𝑗1subscript𝑤𝑖subscript𝑗22\displaystyle\frac{1}{n^{2}r}\sum_{i=1}^{n}\frac{(w_{ij_{1}}w_{ij_{2}})^{2}}{% \pi_{i}}-\frac{1}{n^{2}}\sum_{i=1}^{n}(w_{ij_{1}}w_{ij_{2}})^{2}divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
\displaystyle\leq 1rmaxi=1,,n(nπi)1i=1n𝐖i4n\displaystyle\frac{1}{r}\max_{i=1,\ldots,n}(n\pi_{i})^{-1}\sum_{i=1}^{n}\frac{% \|\mathbf{W}_{i}\|^{4}}{n}divide start_ARG 1 end_ARG start_ARG italic_r end_ARG roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT ( italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
=\displaystyle== 1rOP(1)OP(1)=OP(r1).1𝑟subscript𝑂𝑃1subscript𝑂𝑃1subscript𝑂𝑃superscript𝑟1\displaystyle\frac{1}{r}O_{P}(1)O_{P}(1)=O_{P}(r^{-1}).divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

Using the Chebyshev inequality, for sufficiently large M𝑀Mitalic_M, we have

P(~WWr1/2M|n)𝑃normsubscript~𝑊subscript𝑊conditionalsuperscript𝑟12𝑀subscript𝑛\displaystyle P(\|\tilde{\mathcal{H}}_{W}-\mathcal{H}_{W}\|\geq r^{-1/2}M|% \mathcal{F}_{n})italic_P ( ∥ over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_M | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) rE(~W2|n)M2absent𝑟𝐸conditionalsuperscriptnormsubscript~𝑊2subscript𝑛superscript𝑀2\displaystyle\leq\frac{rE(\|\tilde{\mathcal{H}}_{W}\|^{2}|\mathcal{F}_{n})}{M^% {2}}≤ divide start_ARG italic_r italic_E ( ∥ over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=rj1=1pj2=1pE(~Wj1j2|n)2M2absent𝑟superscriptsubscriptsubscript𝑗11𝑝superscriptsubscriptsubscript𝑗21𝑝𝐸superscriptconditionalsuperscriptsubscript~𝑊subscript𝑗1subscript𝑗2subscript𝑛2superscript𝑀2\displaystyle=\frac{r\sum_{j_{1}=1}^{p}\sum_{j_{2}=1}^{p}E(\tilde{\mathcal{H}}% _{W}^{j_{1}j_{2}}|\mathcal{F}_{n})^{2}}{M^{2}}= divide start_ARG italic_r ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_E ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=OP(1)M20,n,r.formulae-sequenceabsentsubscript𝑂𝑃1superscript𝑀20𝑛𝑟\displaystyle=\frac{O_{P}(1)}{M^{2}}\rightarrow 0,n,r\rightarrow\infty.= divide start_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG → 0 , italic_n , italic_r → ∞ .

Thus, the equation (S2.3) is proved.

By (S2.3) and Assumption 1, we can obtain ~W1=OP|n(1)superscriptsubscript~𝑊1subscript𝑂conditional𝑃subscript𝑛1\tilde{\mathcal{H}}_{W}^{-1}=O_{P|\mathcal{F}_{n}}(1)over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 ). Therefore, combining (S2.1), (S2.2) and (S2.3), we have

𝜷~𝜷^=OP|n(r1/2).~𝜷^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}=O_{P|\mathcal{F}_{n}}(r^{-% 1/2}).over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) .

Then the theorem is proved.

S3 Proof of Theorem 2

Note that

˙*(𝜷^)=1ri=1r{1nπi*[𝐖i*(yi*𝐖i*T𝜷^)]𝚺uu𝜷^}=1ri=1r𝝃i,superscript˙^𝜷1𝑟superscriptsubscript𝑖1𝑟1𝑛superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝚺𝑢𝑢^𝜷1𝑟superscriptsubscript𝑖1𝑟subscript𝝃𝑖\dot{\ell}^{*}(\hat{\boldsymbol{\beta}})=\frac{1}{r}\sum_{i=1}^{r}\left\{\frac% {1}{n\pi_{i}^{*}}[-\mathbf{W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{*T}\hat{% \boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\right\}% =\frac{1}{r}\sum_{i=1}^{r}\boldsymbol{\xi}_{i},over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT { divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG } = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (S3.1)

where 𝝃i=1nπi*[𝐖i*(yi*𝐖i*T𝜷^)]𝚺uu𝜷^subscript𝝃𝑖1𝑛superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\boldsymbol{\xi}_{i}=\frac{1}{n\pi_{i}^{*}}[-\mathbf{W}_{i}^{*}(y_{i}^{*}-% \mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{% \boldsymbol{\beta}}bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG is an independent random vector. Then it can be directly calculated to obtain

E(𝝃i|n)𝐸conditionalsubscript𝝃𝑖subscript𝑛\displaystyle E(\boldsymbol{\xi}_{i}|\mathcal{F}_{n})italic_E ( bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =1ni=1n[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^=𝟎,absent1𝑛superscriptsubscript𝑖1𝑛delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷0\displaystyle=\frac{1}{n}\sum_{i=1}^{n}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{% T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}=% \mathbf{0},= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG = bold_0 , (S3.2)
Var(𝝃i|n)𝑉𝑎𝑟conditionalsubscript𝝃𝑖subscript𝑛\displaystyle Var(\boldsymbol{\xi}_{i}|\mathcal{F}_{n})italic_V italic_a italic_r ( bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =1n2i=1n𝐖i𝐖iT(yi𝐖iT𝜷^)2πi(𝚺uu𝜷^)2=rVc.absent1superscript𝑛2superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝜋𝑖superscriptsubscript𝚺𝑢𝑢^𝜷tensor-productabsent2𝑟subscript𝑉𝑐\displaystyle=\frac{1}{n^{2}}\sum_{i=1}^{n}\frac{\mathbf{W}_{i}\mathbf{W}_{i}^% {T}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}}{\pi_{i}}-(% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}})^{\otimes 2}=rV_{c}.= divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT = italic_r italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT .

According to the Crsubscript𝐶𝑟C_{r}italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT inequality, Assumptions 3 and 4, we have

i=1rE{r1/2𝝃i2I(r1/2𝝃i>ε)|n}superscriptsubscript𝑖1𝑟𝐸conditionalsuperscriptnormsuperscript𝑟12subscript𝝃𝑖2𝐼normsuperscript𝑟12subscript𝝃𝑖𝜀subscript𝑛\displaystyle\sum_{i=1}^{r}E\{\|r^{-1/2}\boldsymbol{\xi}_{i}\|^{2}I(\|r^{-1/2}% \boldsymbol{\xi}_{i}\|>\varepsilon)|\mathcal{F}_{n}\}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_E { ∥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ( ∥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ > italic_ε ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
\displaystyle\leq 1εδr1+δ/2i=1rE{𝝃i2+δ|n}1superscript𝜀𝛿superscript𝑟1𝛿2superscriptsubscript𝑖1𝑟𝐸conditionalsuperscriptnormsubscript𝝃𝑖2𝛿subscript𝑛\displaystyle\frac{1}{\varepsilon^{\delta}r^{1+\delta/2}}\sum_{i=1}^{r}E\{\|% \boldsymbol{\xi}_{i}\|^{2+\delta}|\mathcal{F}_{n}\}divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT 1 + italic_δ / 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_E { ∥ bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=\displaystyle== 1εδr1+δ/2i=1rE{1nπi*[𝐖i*(yi*𝐖i*T𝜷^)]𝚺uu𝜷^2+δ|n}1superscript𝜀𝛿superscript𝑟1𝛿2superscriptsubscript𝑖1𝑟𝐸conditionalsuperscriptnorm1𝑛superscriptsubscript𝜋𝑖delimited-[]superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷subscript𝚺𝑢𝑢^𝜷2𝛿subscript𝑛\displaystyle\frac{1}{\varepsilon^{\delta}r^{1+\delta/2}}\sum_{i=1}^{r}E\left% \{\left\|\frac{1}{n\pi_{i}^{*}}[-\mathbf{W}_{i}^{*}(y_{i}^{*}-\mathbf{W}_{i}^{% *T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}% \right\|^{2+\delta}\bigg{|}\mathcal{F}_{n}\right\}divide start_ARG 1 end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT 1 + italic_δ / 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_E { ∥ divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
\displaystyle\leq 21+δεδr1+δ/2i=1r{E[1nπi*𝐖i*(yi*𝐖i*T𝜷^)2+δ|n]+E[𝚺uu𝜷^2+δ|n]}superscript21𝛿superscript𝜀𝛿superscript𝑟1𝛿2superscriptsubscript𝑖1𝑟𝐸delimited-[]conditionalsuperscriptnorm1𝑛superscriptsubscript𝜋𝑖superscriptsubscript𝐖𝑖superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖absent𝑇^𝜷2𝛿subscript𝑛𝐸delimited-[]conditionalsuperscriptnormsubscript𝚺𝑢𝑢^𝜷2𝛿subscript𝑛\displaystyle\frac{2^{1+\delta}}{\varepsilon^{\delta}r^{1+\delta/2}}\sum_{i=1}% ^{r}\left\{E\left[\left\|\frac{1}{n\pi_{i}^{*}}\mathbf{W}_{i}^{*}(y_{i}^{*}-% \mathbf{W}_{i}^{*T}\hat{\boldsymbol{\beta}})\right\|^{2+\delta}|\mathcal{F}_{n% }\right]+E[\|\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\|^{2+\delta}|% \mathcal{F}_{n}]\right\}divide start_ARG 2 start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT 1 + italic_δ / 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT { italic_E [ ∥ divide start_ARG 1 end_ARG start_ARG italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] + italic_E [ ∥ bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] }
=\displaystyle== 21+δεδrδ/2{i=1n𝐖i2+δ(yi𝐖iT𝜷^)2+δn2+δπi1+δ+𝚺uu𝜷^2+δ}superscript21𝛿superscript𝜀𝛿superscript𝑟𝛿2superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖2𝛿superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝛿superscript𝑛2𝛿superscriptsubscript𝜋𝑖1𝛿superscriptnormsubscript𝚺𝑢𝑢^𝜷2𝛿\displaystyle\frac{2^{1+\delta}}{\varepsilon^{\delta}r^{\delta/2}}\left\{\sum_% {i=1}^{n}\frac{\|\mathbf{W}_{i}\|^{2+\delta}(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{2+\delta}}{n^{2+\delta}\pi_{i}^{1+\delta}}+\|\boldsymbol% {\Sigma}_{uu}\hat{\boldsymbol{\beta}}\|^{2+\delta}\right\}divide start_ARG 2 start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_δ / 2 end_POSTSUPERSCRIPT end_ARG { ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG + ∥ bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT }
\displaystyle\leq 21+δεδrδ/2{maxi=1,,n(nπi)1δi=1n(yi𝐖iT𝜷^)2+δ𝐖i2+δn+𝚺uu𝜷^2+δ}\displaystyle\frac{2^{1+\delta}}{\varepsilon^{\delta}r^{\delta/2}}\left\{\max_% {i=1,\ldots,n}(n\pi_{i})^{-1-\delta}\sum_{i=1}^{n}\frac{(y_{i}-\mathbf{W}_{i}^% {T}\hat{\boldsymbol{\beta}})^{2+\delta}\|\mathbf{W}_{i}\|^{2+\delta}}{n}+\|% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\|^{2+\delta}\right\}divide start_ARG 2 start_POSTSUPERSCRIPT 1 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT italic_r start_POSTSUPERSCRIPT italic_δ / 2 end_POSTSUPERSCRIPT end_ARG { roman_max start_POSTSUBSCRIPT italic_i = 1 , … , italic_n end_POSTSUBSCRIPT ( italic_n italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 - italic_δ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + ∥ bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_δ end_POSTSUPERSCRIPT }
=\displaystyle== OP(rδ/2).subscript𝑂𝑃superscript𝑟𝛿2\displaystyle O_{P}(r^{-\delta/2}).italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - italic_δ / 2 end_POSTSUPERSCRIPT ) .

In the light of the Lindeberg-Feller central limit theorem, it follows that

(i=1rVar(𝝃i|n))1/2i=1r𝝃i=Vc1/2˙*(𝜷^)𝑑Np(𝟎,I).superscriptsuperscriptsubscript𝑖1𝑟𝑉𝑎𝑟conditionalsubscript𝝃𝑖subscript𝑛12superscriptsubscript𝑖1𝑟subscript𝝃𝑖superscriptsubscript𝑉𝑐12superscript˙^𝜷𝑑subscript𝑁𝑝0𝐼\left(\sum_{i=1}^{r}Var(\boldsymbol{\xi}_{i}|\mathcal{F}_{n})\right)^{-1/2}% \sum_{i=1}^{r}\boldsymbol{\xi}_{i}=V_{c}^{-1/2}\dot{\ell}^{*}(\hat{\boldsymbol% {\beta}})\xrightarrow{d}N_{p}(\mathbf{0},I).( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_V italic_a italic_r ( bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) . (S3.3)

By (S2.3), we can obtain

~W1W1=W1(~WW)~W1=OP|n(r1/2).superscriptsubscript~𝑊1superscriptsubscript𝑊1superscriptsubscript𝑊1subscript~𝑊subscript𝑊superscriptsubscript~𝑊1subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\tilde{\mathcal{H}}_{W}^{-1}-\mathcal{H}_{W}^{-1}=-\mathcal{H}_{W}^{-1}(\tilde% {\mathcal{H}}_{W}-\mathcal{H}_{W})\tilde{\mathcal{H}}_{W}^{-1}=O_{P|\mathcal{F% }_{n}}(r^{-1/2}).over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) . (S3.4)

By Assumption 1, Wsubscript𝑊\mathcal{H}_{W}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT converges to a positive definite matrix, then W1=OP(1)superscriptsubscript𝑊1subscript𝑂𝑃1\mathcal{H}_{W}^{-1}=O_{P}(1)caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ). And due to (S3.2), we have

V=W1VcW1=1rW1(rVc)W1=OP(r1).𝑉superscriptsubscript𝑊1subscript𝑉𝑐superscriptsubscript𝑊11𝑟superscriptsubscript𝑊1𝑟subscript𝑉𝑐superscriptsubscript𝑊1subscript𝑂𝑃superscript𝑟1V=\mathcal{H}_{W}^{-1}V_{c}\mathcal{H}_{W}^{-1}=\frac{1}{r}\mathcal{H}_{W}^{-1% }(rV_{c})\mathcal{H}_{W}^{-1}=O_{P}(r^{-1}).italic_V = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_r italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (S3.5)

Therefore, combining (S2.1), (S3.4) and (S3.5), then

V1/2(𝜷~𝜷^)superscript𝑉12~𝜷^𝜷\displaystyle V^{-1/2}(\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}})italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) =V1/2~W1˙*(𝜷^)absentsuperscript𝑉12superscriptsubscript~𝑊1superscript˙^𝜷\displaystyle=-V^{-1/2}\tilde{\mathcal{H}}_{W}^{-1}\dot{\ell}^{*}(\hat{% \boldsymbol{\beta}})= - italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG )
=V1/2W1˙*(𝜷^)V1/2(~W1W1)˙*(𝜷^)absentsuperscript𝑉12superscriptsubscript𝑊1superscript˙^𝜷superscript𝑉12superscriptsubscript~𝑊1superscriptsubscript𝑊1superscript˙^𝜷\displaystyle=-V^{-1/2}\mathcal{H}_{W}^{-1}\dot{\ell}^{*}(\hat{\boldsymbol{% \beta}})-V^{-1/2}(\tilde{\mathcal{H}}_{W}^{-1}-\mathcal{H}_{W}^{-1})\dot{\ell}% ^{*}(\hat{\boldsymbol{\beta}})= - italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) - italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG )
=V1/2W1Vc1/2Vc1/2˙*(𝜷^)+OP|n(r1/2).absentsuperscript𝑉12superscriptsubscript𝑊1superscriptsubscript𝑉𝑐12superscriptsubscript𝑉𝑐12superscript˙^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\displaystyle=-V^{-1/2}\mathcal{H}_{W}^{-1}V_{c}^{1/2}V_{c}^{-1/2}\dot{\ell}^{% *}(\hat{\boldsymbol{\beta}})+O_{P|\mathcal{F}_{n}}(r^{-1/2}).= - italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over˙ start_ARG roman_ℓ end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) + italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) .

Note that

V1/2W1Vc1/2(V1/2W1Vc1/2)T=V1/2W1Vc1/2Vc1/2W1V1/2=I,superscript𝑉12superscriptsubscript𝑊1superscriptsubscript𝑉𝑐12superscriptsuperscript𝑉12superscriptsubscript𝑊1superscriptsubscript𝑉𝑐12𝑇superscript𝑉12superscriptsubscript𝑊1superscriptsubscript𝑉𝑐12superscriptsubscript𝑉𝑐12superscriptsubscript𝑊1superscript𝑉12𝐼V^{-1/2}\mathcal{H}_{W}^{-1}V_{c}^{1/2}(V^{-1/2}\mathcal{H}_{W}^{-1}V_{c}^{1/2% })^{T}=V^{-1/2}\mathcal{H}_{W}^{-1}V_{c}^{1/2}V_{c}^{1/2}\mathcal{H}_{W}^{-1}V% ^{-1/2}=I,italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT = italic_I ,

According to the Slutsky theorem, we have

V1/2(𝜷~𝜷^)𝑑Np(𝟎,I),r,n.formulae-sequence𝑑superscript𝑉12~𝜷^𝜷subscript𝑁𝑝0𝐼𝑟𝑛V^{-1/2}(\tilde{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}})\xrightarrow{d}N_% {p}(\mathbf{0},I),r,n\rightarrow\infty.italic_V start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) , italic_r , italic_n → ∞ . (S3.6)

Then the theorem is proved.

S4 Proof of Theorem 3

First of all, we prove(i). Let Vc*=1rn2i=1n(yi𝐖iT𝜷^)2𝐖i𝐖iTπisuperscriptsubscript𝑉𝑐1𝑟superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝜋𝑖V_{c}^{*}=\frac{1}{rn^{2}}\sum_{i=1}^{n}\frac{(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{2}\mathbf{W}_{i}\mathbf{W}_{i}^{T}}{\pi_{i}}italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG. Note that Vc=Vc*1r(𝚺uu𝜷)2subscript𝑉𝑐superscriptsubscript𝑉𝑐1𝑟superscriptsubscript𝚺𝑢𝑢𝜷tensor-productabsent2V_{c}=V_{c}^{*}-\frac{1}{r}(\boldsymbol{\Sigma}_{uu}\boldsymbol{\beta})^{% \otimes 2}italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_r end_ARG ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β ) start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT, the second item is not related to πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Therefore, to minimize tr(V)𝑡𝑟𝑉tr(V)italic_t italic_r ( italic_V ), we need to minimize tr(W1Vc*W1)𝑡𝑟superscriptsubscript𝑊1superscriptsubscript𝑉𝑐superscriptsubscript𝑊1tr(\mathcal{H}_{W}^{-1}V_{c}^{*}\mathcal{H}_{W}^{-1})italic_t italic_r ( caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ). Then, we have

tr(W1Vc*W1)𝑡𝑟superscriptsubscript𝑊1superscriptsubscript𝑉𝑐superscriptsubscript𝑊1\displaystyle tr(\mathcal{H}_{W}^{-1}V_{c}^{*}\mathcal{H}_{W}^{-1})italic_t italic_r ( caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) =1rn2i=1ntr[(yi𝐖iT𝜷^)2W1𝐖i𝐖iTW1πi]absent1𝑟superscript𝑛2superscriptsubscript𝑖1𝑛𝑡𝑟delimited-[]superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2superscriptsubscript𝑊1subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇superscriptsubscript𝑊1subscript𝜋𝑖\displaystyle=\frac{1}{rn^{2}}\sum_{i=1}^{n}tr\left[\frac{(y_{i}-\mathbf{W}_{i% }^{T}\hat{\boldsymbol{\beta}})^{2}\mathcal{H}_{W}^{-1}\mathbf{W}_{i}\mathbf{W}% _{i}^{T}\mathcal{H}_{W}^{-1}}{\pi_{i}}\right]= divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_t italic_r [ divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ]
=1rn2i=1n(yi𝐖iT𝜷^)2W1𝐖i2πiabsent1𝑟superscript𝑛2superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2superscriptnormsuperscriptsubscript𝑊1subscript𝐖𝑖2subscript𝜋𝑖\displaystyle=\frac{1}{rn^{2}}\sum_{i=1}^{n}\frac{(y_{i}-\mathbf{W}_{i}^{T}% \hat{\boldsymbol{\beta}})^{2}\mathcal{\|}\mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|^% {2}}{\pi_{i}}= divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG
=1rn2i=1nπii=1n(yi𝐖iT𝜷^)2W1𝐖i2πiabsent1𝑟superscript𝑛2superscriptsubscript𝑖1𝑛subscript𝜋𝑖superscriptsubscript𝑖1𝑛superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2superscriptnormsuperscriptsubscript𝑊1subscript𝐖𝑖2subscript𝜋𝑖\displaystyle=\frac{1}{rn^{2}}\sum_{i=1}^{n}\pi_{i}\sum_{i=1}^{n}\frac{(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}\|\mathcal{H}_{W}^{-1}\mathbf{W% }_{i}\|^{2}}{\pi_{i}}= divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG
1rn2[i=1n|yi𝐖iT𝜷^|W1𝐖i]2.absent1𝑟superscript𝑛2superscriptdelimited-[]superscriptsubscript𝑖1𝑛subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsuperscriptsubscript𝑊1subscript𝐖𝑖2\displaystyle\geq\frac{1}{rn^{2}}\left[\sum_{i=1}^{n}|y_{i}-\mathbf{W}_{i}^{T}% \hat{\boldsymbol{\beta}}|\cdot\|\mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|\right]^{2}.≥ divide start_ARG 1 end_ARG start_ARG italic_r italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG [ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ⋅ ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

The last inequality sign is the application of the Cauchy-Schwarz inequality. The equation holds only if and when πi|yi𝐖iT𝜷^|W1𝐖iproportional-tosubscript𝜋𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷normsuperscriptsubscript𝑊1subscript𝐖𝑖\pi_{i}\propto|y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}}|\cdot\|% \mathcal{H}_{W}^{-1}\mathbf{W}_{i}\|italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∝ | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG | ⋅ ∥ caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥. (ii) can be proved in the same way, so we omit the proof details here.

S5 Proof of Theorems 4 and 5

Since the proof methods of these two theorems are similar to those of Theorems 1 and 2, and Wang, Zhu and Ma(2018) have provided specific proof methods, we omit the proofs of these two theorems here.

S6 Proof of Theorem 6

First, we consider the case where m=1𝑚1m=1italic_m = 1. Note that

𝜷ˇ𝜷^ˇ𝜷^𝜷\displaystyle\check{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}overroman_ˇ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG =(1ni=1nψi𝐖i𝐖iT𝚺uu)1[1ni=1nψi𝐖i(yi𝐖iT𝜷^)+𝚺uu𝜷^]absentsuperscript1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢1delimited-[]1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\displaystyle=\left(\frac{1}{n}\sum_{i=1}^{n}\psi_{i}\mathbf{W}_{i}\mathbf{W}_% {i}^{T}-\boldsymbol{\Sigma}_{uu}\right)^{-1}\cdot\left[\frac{1}{n}\sum_{i=1}^{% n}\psi_{i}\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})+% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\right]= ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) + bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ] (S6.1)
=(ˇW)1L˙*(𝜷^),absentsuperscriptsubscriptˇ𝑊1superscript˙𝐿^𝜷\displaystyle=-(\check{\mathcal{H}}_{W})^{-1}\dot{L}^{*}(\hat{\boldsymbol{% \beta}}),= - ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ,

where ˇW=1ni=1nψi(𝐖i𝐖iT)𝚺uu,L˙*(𝜷)=1ni=1nψi[𝐖i(yi𝐖iT𝜷)]𝚺uu𝜷.formulae-sequencesubscriptˇ𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢superscript˙𝐿𝜷1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇𝜷subscript𝚺𝑢𝑢𝜷\check{\mathcal{H}}_{W}=\frac{1}{n}\sum_{i=1}^{n}\psi_{i}(\mathbf{W}_{i}% \mathbf{W}_{i}^{T})-\boldsymbol{\Sigma}_{uu},\dot{L}^{*}(\boldsymbol{\beta})=% \frac{1}{n}\sum_{i=1}^{n}\psi_{i}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}% \boldsymbol{\beta})]\\ -\boldsymbol{\Sigma}_{uu}\boldsymbol{\beta}.overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT , over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( bold_italic_β ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_β ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT bold_italic_β . Therefore, it is only necessary to demonstrate that

L˙*(𝜷^)=OP|n(r1/2),superscript˙𝐿^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\dot{L}^{*}(\hat{\boldsymbol{\beta}})=O_{P|\mathcal{F}_{n}}(r^{-1/2}),over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) , (S6.2)

and

ˇWW=OP|n(r1/2),subscriptˇ𝑊subscript𝑊subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\check{\mathcal{H}}_{W}-\mathcal{H}_{W}=O_{P|\mathcal{F}_{n}}(r^{-1/2}),overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) , (S6.3)

where W=1ni=1n𝐖i𝐖iT𝚺uu.subscript𝑊1𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇subscript𝚺𝑢𝑢\mathcal{H}_{W}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}-% \boldsymbol{\Sigma}_{uu}.caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT .

Note that

E(ψi)=E(μiνi)=qn1qn=1,𝐸subscript𝜓𝑖𝐸subscript𝜇𝑖subscript𝜈𝑖subscript𝑞𝑛1subscript𝑞𝑛1E(\psi_{i})=E(\mu_{i}\nu_{i})=q_{n}\cdot\frac{1}{q_{n}}=1,italic_E ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_E ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG = 1 ,
E(ψi2)=(qn(1qn)+qn2)(bn2+1qn2)=qnbn2+1qn,𝐸superscriptsubscript𝜓𝑖2subscript𝑞𝑛1subscript𝑞𝑛superscriptsubscript𝑞𝑛2superscriptsubscript𝑏𝑛21superscriptsubscript𝑞𝑛2subscript𝑞𝑛superscriptsubscript𝑏𝑛21subscript𝑞𝑛E(\psi_{i}^{2})=(q_{n}(1-q_{n})+q_{n}^{2})\cdot\left(b_{n}^{2}+\frac{1}{q_{n}^% {2}}\right)=q_{n}b_{n}^{2}+\frac{1}{q_{n}},italic_E ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = ( italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( 1 - italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ⋅ ( italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) = italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ,

and

Var(ψi)=E(ψi2)(E(ψi))2=qnbn2+1qn1=nanr,𝑉𝑎𝑟subscript𝜓𝑖𝐸superscriptsubscript𝜓𝑖2superscript𝐸subscript𝜓𝑖2subscript𝑞𝑛superscriptsubscript𝑏𝑛21subscript𝑞𝑛1𝑛subscript𝑎𝑛𝑟Var(\psi_{i})=E(\psi_{i}^{2})-(E(\psi_{i}))^{2}=q_{n}b_{n}^{2}+\frac{1}{q_{n}}% -1=\frac{na_{n}}{r},italic_V italic_a italic_r ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_E ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) - ( italic_E ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG - 1 = divide start_ARG italic_n italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG ,

where an=bn2qn2qn+1.subscript𝑎𝑛superscriptsubscript𝑏𝑛2superscriptsubscript𝑞𝑛2subscript𝑞𝑛1a_{n}=b_{n}^{2}q_{n}^{2}-q_{n}+1.italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + 1 . According to the Assumption 5, as qn0subscript𝑞𝑛0q_{n}\rightarrow 0italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → 0, lim supnan=lim supnqnVar(ψ)=lim supnqn(E(ψi2)1)=<.\limsup\limits_{n\rightarrow\infty}a_{n}=\limsup\limits_{n\rightarrow\infty}q_% {n}Var(\psi)=\limsup\limits_{n\rightarrow\infty}q_{n}(E(\psi_{i}^{2})-1)=<\infty.lim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = lim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_V italic_a italic_r ( italic_ψ ) = lim sup start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_E ( italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) - 1 ) = < ∞ .

To prove (S6.2), we calculate directly to obtain

E(L˙*(𝜷^)|n)𝐸conditionalsuperscript˙𝐿^𝜷subscript𝑛\displaystyle E(\dot{L}^{*}(\hat{\boldsymbol{\beta}})|\mathcal{F}_{n})italic_E ( over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =E{1ni=1nψi[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^|n}absent𝐸conditional-set1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷subscript𝑛\displaystyle=E\left\{\frac{1}{n}\sum_{i=1}^{n}\psi_{i}[-\mathbf{W}_{i}(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{% \boldsymbol{\beta}}\bigg{|}\mathcal{F}_{n}\right\}= italic_E { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=1ni=1n[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^=𝟎.absent1𝑛superscriptsubscript𝑖1𝑛delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷0\displaystyle=\frac{1}{n}\sum_{i=1}^{n}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{% T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}=% \mathbf{0}.= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG = bold_0 .

For the j𝑗jitalic_j-th element of L˙*(𝜷^)superscript˙𝐿^𝜷\dot{L}^{*}(\hat{\boldsymbol{\beta}})over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ), represented as L˙j*(𝜷^)=1ni=1nψi[wij(yi𝐖iT𝜷^)](𝚺uu𝜷^)jsuperscriptsubscript˙𝐿𝑗^𝜷1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖delimited-[]subscript𝑤𝑖𝑗subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscriptsubscript𝚺𝑢𝑢^𝜷𝑗\dot{L}_{j}^{*}(\hat{\boldsymbol{\beta}})=\frac{1}{n}\sum_{i=1}^{n}\psi_{i}[-w% _{ij}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-(\boldsymbol{\Sigma}_% {uu}\hat{\boldsymbol{\beta}})_{j}over˙ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. By (S2.4) and Assumption 6, we have

Var(L˙j*(𝜷^)|n)𝑉𝑎𝑟conditionalsuperscriptsubscript˙𝐿𝑗^𝜷subscript𝑛\displaystyle Var(\dot{L}_{j}^{*}(\hat{\boldsymbol{\beta}})|\mathcal{F}_{n})italic_V italic_a italic_r ( over˙ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =Var{1ni=1nψi[wij(yi𝐖iT𝜷^)](𝚺uu𝜷^)j|n}absent𝑉𝑎𝑟conditional-set1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖delimited-[]subscript𝑤𝑖𝑗subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscriptsubscript𝚺𝑢𝑢^𝜷𝑗subscript𝑛\displaystyle=Var\left\{\frac{1}{n}\sum_{i=1}^{n}\psi_{i}[-w_{ij}(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-(\boldsymbol{\Sigma}_{uu}\hat{% \boldsymbol{\beta}})_{j}\bigg{|}\mathcal{F}_{n}\right\}= italic_V italic_a italic_r { divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=1n2nanri=1nwij2(yi𝐖iT𝜷^)2absent1superscript𝑛2𝑛subscript𝑎𝑛𝑟superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖𝑗2superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2\displaystyle=\frac{1}{n^{2}}\frac{na_{n}}{r}\sum_{i=1}^{n}w_{ij}^{2}(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}= divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_n italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
anri=1n𝐖i2(yi𝐖iT𝜷^)2nabsentsubscript𝑎𝑛𝑟superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖2superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝑛\displaystyle\leq\frac{a_{n}}{r}\sum_{i=1}^{n}\frac{\|\mathbf{W}_{i}\|^{2}(y_{% i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}}{n}≤ divide start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG
=1rOP(1)=OP(r1).absent1𝑟subscript𝑂𝑃1subscript𝑂𝑃superscript𝑟1\displaystyle=\frac{1}{r}O_{P}(1)=O_{P}(r^{-1}).= divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

From the Chebyshev inequality, for sufficiently large M𝑀Mitalic_M, we have

P(L˙*(𝜷^)r1/2M|n)𝑃normsuperscript˙𝐿^𝜷conditionalsuperscript𝑟12𝑀subscript𝑛\displaystyle P(\|\dot{L}^{*}(\hat{\boldsymbol{\beta}})\|\geq r^{-1/2}M|% \mathcal{F}_{n})italic_P ( ∥ over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_M | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) rE(L˙*(𝜷^)2|n)M2absent𝑟𝐸conditionalsuperscriptnormsuperscript˙𝐿^𝜷2subscript𝑛superscript𝑀2\displaystyle\leq\frac{rE(\|\dot{L}^{*}(\hat{\boldsymbol{\beta}})\|^{2}|% \mathcal{F}_{n})}{M^{2}}≤ divide start_ARG italic_r italic_E ( ∥ over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=rj=1pE(L˙j*(𝜷^)|n)2M2absent𝑟superscriptsubscript𝑗1𝑝𝐸superscriptconditionalsuperscriptsubscript˙𝐿𝑗^𝜷subscript𝑛2superscript𝑀2\displaystyle=\frac{r\sum_{j=1}^{p}E(\dot{L}_{j}^{*}(\hat{\boldsymbol{\beta}})% |\mathcal{F}_{n})^{2}}{M^{2}}= divide start_ARG italic_r ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_E ( over˙ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=OP(1)M20,n,r.formulae-sequenceabsentsubscript𝑂𝑃1superscript𝑀20𝑛𝑟\displaystyle=\frac{O_{P}(1)}{M^{2}}\rightarrow 0,n,r\rightarrow\infty.= divide start_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG → 0 , italic_n , italic_r → ∞ .

Thus, the equation (S6.2) is proved.

To prove (S6.3), We calculate directly to obtain

E(ˇW|n)=W,𝐸conditionalsubscriptˇ𝑊subscript𝑛subscript𝑊E(\check{\mathcal{H}}_{W}|\mathcal{F}_{n})=\mathcal{H}_{W},italic_E ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ,

For any element ˇWj1j2,1j1,j2pformulae-sequencesuperscriptsubscriptˇ𝑊subscript𝑗1subscript𝑗21subscript𝑗1subscript𝑗2𝑝\check{\mathcal{H}}_{W}^{j_{1}j_{2}},1\leq j_{1},j_{2}\leq poverroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , 1 ≤ italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_p of ˇWsubscriptˇ𝑊\check{\mathcal{H}}_{W}overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT, by Assumptions 2 and 5, we have

Var(ˇWj1j2|n)𝑉𝑎𝑟conditionalsuperscriptsubscriptˇ𝑊subscript𝑗1subscript𝑗2subscript𝑛\displaystyle Var(\check{\mathcal{H}}_{W}^{j_{1}j_{2}}|\mathcal{F}_{n})italic_V italic_a italic_r ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =Var[1ni=1nψi(Wij1Wij2)(𝚺uu)j1j2|n]absent𝑉𝑎𝑟delimited-[]1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖subscript𝑊𝑖subscript𝑗1subscript𝑊𝑖subscript𝑗2conditionalsubscriptsubscript𝚺𝑢𝑢subscript𝑗1subscript𝑗2subscript𝑛\displaystyle=Var\left[\frac{1}{n}\sum_{i=1}^{n}\psi_{i}(W_{ij_{1}}W_{ij_{2}})% -(\boldsymbol{\Sigma}_{uu})_{j_{1}j_{2}}\Big{|}\mathcal{F}_{n}\right]= italic_V italic_a italic_r [ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - ( bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ]
=1n2nanri=1n(Wij1Wij2)2absent1superscript𝑛2𝑛subscript𝑎𝑛𝑟superscriptsubscript𝑖1𝑛superscriptsubscript𝑊𝑖subscript𝑗1subscript𝑊𝑖subscript𝑗22\displaystyle=\frac{1}{n^{2}}\frac{na_{n}}{r}\sum_{i=1}^{n}(W_{ij_{1}}W_{ij_{2% }})^{2}= divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_n italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_i italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
anri=1n𝐖i4n=1rOP(1)absentsubscript𝑎𝑛𝑟superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖4𝑛1𝑟subscript𝑂𝑃1\displaystyle\leq\frac{a_{n}}{r}\sum_{i=1}^{n}\frac{\|\mathbf{W}_{i}\|^{4}}{n}% =\frac{1}{r}O_{P}(1)≤ divide start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG = divide start_ARG 1 end_ARG start_ARG italic_r end_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 )
=OP(r1).absentsubscript𝑂𝑃superscript𝑟1\displaystyle=O_{P}(r^{-1}).= italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) .

From the Chebyshev inequality, for sufficiently large M𝑀Mitalic_M, we have

P(ˇWWr1/2M|n)𝑃normsubscriptˇ𝑊subscript𝑊conditionalsuperscript𝑟12𝑀subscript𝑛\displaystyle P(\|\check{\mathcal{H}}_{W}-\mathcal{H}_{W}\|\geq r^{-1/2}M|% \mathcal{F}_{n})italic_P ( ∥ overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∥ ≥ italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_M | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) rE(ˇW2|n)M2absent𝑟𝐸conditionalsuperscriptnormsubscriptˇ𝑊2subscript𝑛superscript𝑀2\displaystyle\leq\frac{rE(\|\check{\mathcal{H}}_{W}\|^{2}|\mathcal{F}_{n})}{M^% {2}}≤ divide start_ARG italic_r italic_E ( ∥ overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=rj1=1pj2=1pE(ˇWj1j2|n)2M2absent𝑟superscriptsubscriptsubscript𝑗11𝑝superscriptsubscriptsubscript𝑗21𝑝𝐸superscriptconditionalsuperscriptsubscriptˇ𝑊subscript𝑗1subscript𝑗2subscript𝑛2superscript𝑀2\displaystyle=\frac{r\sum_{j_{1}=1}^{p}\sum_{j_{2}=1}^{p}E(\check{\mathcal{H}}% _{W}^{j_{1}j_{2}}|\mathcal{F}_{n})^{2}}{M^{2}}= divide start_ARG italic_r ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_E ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG
=OP(1)M20,n,r.formulae-sequenceabsentsubscript𝑂𝑃1superscript𝑀20𝑛𝑟\displaystyle=\frac{O_{P}(1)}{M^{2}}\rightarrow 0,n,r\rightarrow\infty.= divide start_ARG italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) end_ARG start_ARG italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG → 0 , italic_n , italic_r → ∞ .

Thus, the equation (S6.3) is proved.

By (S6.3) and Assumption 1, we have ˇW1=OP|n(1)superscriptsubscriptˇ𝑊1subscript𝑂conditional𝑃subscript𝑛1\check{\mathcal{H}}_{W}^{-1}=O_{P|\mathcal{F}_{n}}(1)overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 ). Therefore, combining (S6.1), (S6.2) and (S6.3), then

𝜷ˇ𝜷^=OP|n(r1/2).ˇ𝜷^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\check{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}}=O_{P|\mathcal{F}_{n}}(r^{-% 1/2}).overroman_ˇ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) . (S6.4)

As m>1𝑚1m>1italic_m > 1 , we have 𝜷ˇ(m)=1mk=1m𝜷ˇk.superscriptˇ𝜷𝑚1𝑚superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘\check{\boldsymbol{\beta}}^{(m)}=\frac{1}{m}\sum_{k=1}^{m}\check{\boldsymbol{% \beta}}_{k}.overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . Then according to the weak law of large numbers, it follows that

𝜷ˇ(m)𝜷^=1mk=1m𝜷ˇk𝜷^=1mk=1m(𝜷ˇk𝜷^)=OP|n((mr)1/2).superscriptˇ𝜷𝑚^𝜷1𝑚superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘^𝜷1𝑚superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑚𝑟12\check{\boldsymbol{\beta}}^{(m)}-\hat{\boldsymbol{\beta}}=\frac{1}{m}\sum_{k=1% }^{m}\check{\boldsymbol{\beta}}_{k}-\hat{\boldsymbol{\beta}}=\frac{1}{m}\sum_{% k=1}^{m}(\check{\boldsymbol{\beta}}_{k}-\hat{\boldsymbol{\beta}})=O_{P|% \mathcal{F}_{n}}((mr)^{-1/2}).overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG bold_italic_β end_ARG = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG bold_italic_β end_ARG ) = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ( italic_m italic_r ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) .

Then the theorem is proved.

S7 Proof of Theorem 7

Firstly, we prove the case where m=1𝑚1m=1italic_m = 1. Because

L˙*(𝜷^)=1ni=1n{ψi[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^}=1ri=1n𝜼i,superscript˙𝐿^𝜷1𝑛superscriptsubscript𝑖1𝑛subscript𝜓𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷1𝑟superscriptsubscript𝑖1𝑛subscript𝜼𝑖\dot{L}^{*}(\hat{\boldsymbol{\beta}})=\frac{1}{n}\sum_{i=1}^{n}\{\psi_{i}[-% \mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-\boldsymbol{% \Sigma}_{uu}\hat{\boldsymbol{\beta}}\}=\frac{1}{\sqrt{r}}\sum_{i=1}^{n}% \boldsymbol{\eta}_{i},over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG } = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_r end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (S7.1)

where 𝜼i=rn{ψi[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^}subscript𝜼𝑖𝑟𝑛subscript𝜓𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\boldsymbol{\eta}_{i}=\frac{\sqrt{r}}{n}\{\psi_{i}[-\mathbf{W}_{i}(y_{i}-% \mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})]-\boldsymbol{\Sigma}_{uu}\hat{% \boldsymbol{\beta}}\}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG square-root start_ARG italic_r end_ARG end_ARG start_ARG italic_n end_ARG { italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG } is an independent random vector. Note that

E(𝜼i|n)𝐸conditionalsubscript𝜼𝑖subscript𝑛\displaystyle E(\boldsymbol{\eta}_{i}|\mathcal{F}_{n})italic_E ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =rn[𝐖i(yi𝐖iT𝜷^)𝚺uu𝜷^],absent𝑟𝑛delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷\displaystyle=\frac{\sqrt{r}}{n}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}],= divide start_ARG square-root start_ARG italic_r end_ARG end_ARG start_ARG italic_n end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ] ,
Var(𝜼i|n)𝑉𝑎𝑟conditionalsubscript𝜼𝑖subscript𝑛\displaystyle Var(\boldsymbol{\eta}_{i}|\mathcal{F}_{n})italic_V italic_a italic_r ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =rn2nanr[𝐖i(yi𝐖iT𝜷^)]2=ann𝐖i𝐖iT(yi𝐖iT𝜷^)2.absent𝑟superscript𝑛2𝑛subscript𝑎𝑛𝑟superscriptdelimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷tensor-productabsent2subscript𝑎𝑛𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2\displaystyle=\frac{r}{n^{2}}\frac{na_{n}}{r}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}% _{i}^{T}\hat{\boldsymbol{\beta}})]^{\otimes 2}=\frac{a_{n}}{n}\mathbf{W}_{i}% \mathbf{W}_{i}^{T}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2}.= divide start_ARG italic_r end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG italic_n italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_r end_ARG [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] start_POSTSUPERSCRIPT ⊗ 2 end_POSTSUPERSCRIPT = divide start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Then by using Assumptions 2 and 5, we obtain

i=1nE(𝜼i|n)=rni=1n[𝐖i(yi𝐖iT𝜷^)𝚺uu𝜷^]=𝟎,superscriptsubscript𝑖1𝑛𝐸conditionalsubscript𝜼𝑖subscript𝑛𝑟𝑛superscriptsubscript𝑖1𝑛delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷0\sum_{i=1}^{n}E(\boldsymbol{\eta}_{i}|\mathcal{F}_{n})=\frac{\sqrt{r}}{n}\sum_% {i=1}^{n}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})-% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}]=\mathbf{0},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG square-root start_ARG italic_r end_ARG end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ] = bold_0 ,
i=1nVar(𝜼i|n)=anni=1n𝐖i𝐖iT(yi𝐖iT𝜷^)2=anΣc.superscriptsubscript𝑖1𝑛𝑉𝑎𝑟conditionalsubscript𝜼𝑖subscript𝑛subscript𝑎𝑛𝑛superscriptsubscript𝑖1𝑛subscript𝐖𝑖superscriptsubscript𝐖𝑖𝑇superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2subscript𝑎𝑛subscriptΣ𝑐\sum_{i=1}^{n}Var(\boldsymbol{\eta}_{i}|\mathcal{F}_{n})=\frac{a_{n}}{n}\sum_{% i=1}^{n}\mathbf{W}_{i}\mathbf{W}_{i}^{T}(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})^{2}=a_{n}\Sigma_{c}.∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_V italic_a italic_r ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT . (S7.2)

According to the Crsubscript𝐶𝑟C_{r}italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT inequality, Assumptions 4 and 5, we have

i=1nE{𝜼i2I(𝜼i>ε)|n}superscriptsubscript𝑖1𝑛𝐸conditionalsuperscriptnormsubscript𝜼𝑖2𝐼normsubscript𝜼𝑖𝜀subscript𝑛\displaystyle\sum_{i=1}^{n}E\{\|\boldsymbol{\eta}_{i}\|^{2}I(\|\boldsymbol{% \eta}_{i}\|>\varepsilon)|\mathcal{F}_{n}\}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E { ∥ bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ( ∥ bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ > italic_ε ) | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
\displaystyle\leq εαi=1nE{𝜼i2+α|n}superscript𝜀𝛼superscriptsubscript𝑖1𝑛𝐸conditionalsuperscriptnormsubscript𝜼𝑖2𝛼subscript𝑛\displaystyle\varepsilon^{-\alpha}\sum_{i=1}^{n}E\{\|\boldsymbol{\eta}_{i}\|^{% 2+\alpha}|\mathcal{F}_{n}\}italic_ε start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E { ∥ bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=\displaystyle== εαi=1nE{rn{ψi[𝐖i(yi𝐖iT𝜷^)]𝚺uu𝜷^}2+α|n}superscript𝜀𝛼superscriptsubscript𝑖1𝑛𝐸conditionalsuperscriptnorm𝑟𝑛subscript𝜓𝑖delimited-[]subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷2𝛼subscript𝑛\displaystyle\varepsilon^{-\alpha}\sum_{i=1}^{n}E\left\{\left\|\frac{\sqrt{r}}% {n}\{\psi_{i}[-\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}}% )]-\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\}\right\|^{2+\alpha}\bigg{% |}\mathcal{F}_{n}\right\}italic_ε start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E { ∥ divide start_ARG square-root start_ARG italic_r end_ARG end_ARG start_ARG italic_n end_ARG { italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ] - bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG } ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
=\displaystyle== r1+α2εαn2+αi=1nE{ψi𝐖i(yi𝐖iT𝜷^)+𝚺uu𝜷^2+α|n}superscript𝑟1𝛼2superscript𝜀𝛼superscript𝑛2𝛼superscriptsubscript𝑖1𝑛𝐸conditionalsuperscriptnormsubscript𝜓𝑖subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷subscript𝚺𝑢𝑢^𝜷2𝛼subscript𝑛\displaystyle\frac{r^{1+\frac{\alpha}{2}}}{\varepsilon^{\alpha}n^{2+\alpha}}% \sum_{i=1}^{n}E\{\|\psi_{i}\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}\hat{% \boldsymbol{\beta}})+\boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\|^{2+% \alpha}|\mathcal{F}_{n}\}divide start_ARG italic_r start_POSTSUPERSCRIPT 1 + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_E { ∥ italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) + bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT }
\displaystyle\leq r1+α221+αεαn2+αi=1n{E[ψi𝐖i(yi𝐖iT𝜷^)2+α|n]+E[𝚺uu𝜷^2+α|n]}superscript𝑟1𝛼2superscript21𝛼superscript𝜀𝛼superscript𝑛2𝛼superscriptsubscript𝑖1𝑛𝐸delimited-[]conditionalsuperscriptnormsubscript𝜓𝑖subscript𝐖𝑖subscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝛼subscript𝑛𝐸delimited-[]conditionalsuperscriptnormsubscript𝚺𝑢𝑢^𝜷2𝛼subscript𝑛\displaystyle\frac{r^{1+\frac{\alpha}{2}}2^{1+\alpha}}{\varepsilon^{\alpha}n^{% 2+\alpha}}\sum_{i=1}^{n}\{E[\|\psi_{i}\mathbf{W}_{i}(y_{i}-\mathbf{W}_{i}^{T}% \hat{\boldsymbol{\beta}})\|^{2+\alpha}|\mathcal{F}_{n}]+E[\|\boldsymbol{\Sigma% }_{uu}\hat{\boldsymbol{\beta}}\|^{2+\alpha}|\mathcal{F}_{n}]\}divide start_ARG italic_r start_POSTSUPERSCRIPT 1 + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT 1 + italic_α end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { italic_E [ ∥ italic_ψ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] + italic_E [ ∥ bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] }
=\displaystyle== r1+α221+αεαn1+α{E(ψ)2+α1ni=1n𝐖i2+α(yi𝐖iT𝜷^)2+α+𝚺uu𝜷^2+α}superscript𝑟1𝛼2superscript21𝛼superscript𝜀𝛼superscript𝑛1𝛼𝐸superscript𝜓2𝛼1𝑛superscriptsubscript𝑖1𝑛superscriptnormsubscript𝐖𝑖2𝛼superscriptsubscript𝑦𝑖superscriptsubscript𝐖𝑖𝑇^𝜷2𝛼superscriptnormsubscript𝚺𝑢𝑢^𝜷2𝛼\displaystyle\frac{r^{1+\frac{\alpha}{2}}2^{1+\alpha}}{\varepsilon^{\alpha}n^{% 1+\alpha}}\left\{E(\psi)^{2+\alpha}\frac{1}{n}\sum_{i=1}^{n}\|\mathbf{W}_{i}\|% ^{2+\alpha}(y_{i}-\mathbf{W}_{i}^{T}\hat{\boldsymbol{\beta}})^{2+\alpha}+\|% \boldsymbol{\Sigma}_{uu}\hat{\boldsymbol{\beta}}\|^{2+\alpha}\right\}divide start_ARG italic_r start_POSTSUPERSCRIPT 1 + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT 1 + italic_α end_POSTSUPERSCRIPT end_ARG start_ARG italic_ε start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 1 + italic_α end_POSTSUPERSCRIPT end_ARG { italic_E ( italic_ψ ) start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over^ start_ARG bold_italic_β end_ARG ) start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT + ∥ bold_Σ start_POSTSUBSCRIPT italic_u italic_u end_POSTSUBSCRIPT over^ start_ARG bold_italic_β end_ARG ∥ start_POSTSUPERSCRIPT 2 + italic_α end_POSTSUPERSCRIPT }
=\displaystyle== OP(rα/2).subscript𝑂𝑃superscript𝑟𝛼2\displaystyle O_{P}(r^{-\alpha/2}).italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - italic_α / 2 end_POSTSUPERSCRIPT ) .

Therefore, the Lindeberg-Feller condition is satisfied. According to the Lindeberg-Feller central limit theorem, we have

(i=1nVar(𝜼i|n))1/2i=1n𝜼i=ranΣc1/2L˙*(𝜷^)𝑑Np(𝟎,I).superscriptsuperscriptsubscript𝑖1𝑛𝑉𝑎𝑟conditionalsubscript𝜼𝑖subscript𝑛12superscriptsubscript𝑖1𝑛subscript𝜼𝑖𝑟subscript𝑎𝑛superscriptsubscriptΣ𝑐12superscript˙𝐿^𝜷𝑑subscript𝑁𝑝0𝐼\left(\sum_{i=1}^{n}Var(\boldsymbol{\eta}_{i}|\mathcal{F}_{n})\right)^{-1/2}% \sum_{i=1}^{n}\boldsymbol{\eta}_{i}=\sqrt{\frac{r}{a_{n}}}\Sigma_{c}^{-1/2}% \dot{L}^{*}(\hat{\boldsymbol{\beta}})\xrightarrow{d}N_{p}(\mathbf{0},I).( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_V italic_a italic_r ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) . (S7.3)

By (S6.3), we have

ˇW1W1=W1(ˇWW)ˇW1=OP|n(r1/2),superscriptsubscriptˇ𝑊1superscriptsubscript𝑊1superscriptsubscript𝑊1subscriptˇ𝑊subscript𝑊superscriptsubscriptˇ𝑊1subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\check{\mathcal{H}}_{W}^{-1}-\mathcal{H}_{W}^{-1}=-\mathcal{H}_{W}^{-1}(\check% {\mathcal{H}}_{W}-\mathcal{H}_{W})\check{\mathcal{H}}_{W}^{-1}=O_{P|\mathcal{F% }_{n}}(r^{-1/2}),overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ) overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) , (S7.4)

By Assumption 1, Wsubscript𝑊\mathcal{H}_{W}caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT converges to a positive definite matrix, then W1=OP(1)superscriptsubscript𝑊1subscript𝑂𝑃1\mathcal{H}_{W}^{-1}=O_{P}(1)caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ). And due to (S7.2), we obtain

Σ=W1ΣcW1=OP(1).Σsuperscriptsubscript𝑊1subscriptΣ𝑐superscriptsubscript𝑊1subscript𝑂𝑃1\Sigma=\mathcal{H}_{W}^{-1}\Sigma_{c}\mathcal{H}_{W}^{-1}=O_{P}(1).roman_Σ = caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = italic_O start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( 1 ) . (S7.5)

Therefore, combining(S6.1), (S7.4) and (S7.5), we have

ranΣ1/2(𝜷ˇ𝜷^)𝑟subscript𝑎𝑛superscriptΣ12ˇ𝜷^𝜷\displaystyle\sqrt{\frac{r}{a_{n}}}\Sigma^{-1/2}(\check{\boldsymbol{\beta}}-% \hat{\boldsymbol{\beta}})square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( overroman_ˇ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) =ranΣ1/2ˇW1L˙*(𝜷^)absent𝑟subscript𝑎𝑛superscriptΣ12superscriptsubscriptˇ𝑊1superscript˙𝐿^𝜷\displaystyle=-\sqrt{\frac{r}{a_{n}}}\Sigma^{-1/2}\check{\mathcal{H}}_{W}^{-1}% \dot{L}^{*}(\hat{\boldsymbol{\beta}})= - square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG )
=ranΣ1/2W1L˙*(𝜷^)ranΣ1/2(ˇW1W1)L˙*(𝜷^)absent𝑟subscript𝑎𝑛superscriptΣ12superscriptsubscript𝑊1superscript˙𝐿^𝜷𝑟subscript𝑎𝑛superscriptΣ12superscriptsubscriptˇ𝑊1superscriptsubscript𝑊1superscript˙𝐿^𝜷\displaystyle=-\sqrt{\frac{r}{a_{n}}}\Sigma^{-1/2}\mathcal{H}_{W}^{-1}\dot{L}^% {*}(\hat{\boldsymbol{\beta}})-\sqrt{\frac{r}{a_{n}}}\Sigma^{-1/2}(\check{% \mathcal{H}}_{W}^{-1}-\mathcal{H}_{W}^{-1})\dot{L}^{*}(\hat{\boldsymbol{\beta}})= - square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) - square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ( overroman_ˇ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT - caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG )
=ranΣ1/2W1Σc1/2Σc1/2L˙*(𝜷^)+OP|n(r1/2).absent𝑟subscript𝑎𝑛superscriptΣ12superscriptsubscript𝑊1superscriptsubscriptΣ𝑐12superscriptsubscriptΣ𝑐12superscript˙𝐿^𝜷subscript𝑂conditional𝑃subscript𝑛superscript𝑟12\displaystyle=-\sqrt{\frac{r}{a_{n}}}{\Sigma}^{-1/2}\mathcal{H}_{W}^{-1}\Sigma% _{c}^{1/2}\Sigma_{c}^{-1/2}\dot{L}^{*}(\hat{\boldsymbol{\beta}})+O_{P|\mathcal% {F}_{n}}(r^{-1/2}).= - square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT over˙ start_ARG italic_L end_ARG start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_β end_ARG ) + italic_O start_POSTSUBSCRIPT italic_P | caligraphic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_r start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT ) .

Note that

Σ1/2W1Σc1/2(Σ1/2W1Σc1/2)T=Σ1/2W1Σc1/2Σc1/2W1Σ1/2=I,superscriptΣ12superscriptsubscript𝑊1superscriptsubscriptΣ𝑐12superscriptsuperscriptΣ12superscriptsubscript𝑊1superscriptsubscriptΣ𝑐12𝑇superscriptΣ12superscriptsubscript𝑊1superscriptsubscriptΣ𝑐12superscriptsubscriptΣ𝑐12superscriptsubscript𝑊1superscriptΣ12𝐼\Sigma^{-1/2}\mathcal{H}_{W}^{-1}\Sigma_{c}^{1/2}(\Sigma^{-1/2}\mathcal{H}_{W}% ^{-1}\Sigma_{c}^{1/2})^{T}=\Sigma^{-1/2}\mathcal{H}_{W}^{-1}\Sigma_{c}^{1/2}% \Sigma_{c}^{1/2}\mathcal{H}_{W}^{-1}\Sigma^{-1/2}=I,roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT = roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT roman_Σ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT caligraphic_H start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT = italic_I ,

Using of the Slutsky theorem and (S7.3), we have

Σ1/2r/an(𝜷ˇ𝜷^)𝑑Np(𝟎,I),r,n.formulae-sequence𝑑superscriptΣ12𝑟subscript𝑎𝑛ˇ𝜷^𝜷subscript𝑁𝑝0𝐼𝑟𝑛\Sigma^{-1/2}\sqrt{r/a_{n}}(\check{\boldsymbol{\beta}}-\hat{\boldsymbol{\beta}% })\xrightarrow{d}N_{p}(\mathbf{0},I),r,n\rightarrow\infty.roman_Σ start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT square-root start_ARG italic_r / italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ( overroman_ˇ start_ARG bold_italic_β end_ARG - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , italic_I ) , italic_r , italic_n → ∞ . (S7.6)

As m>1𝑚1m>1italic_m > 1, we have 𝜷ˇ(m)=1mk=1m𝜷ˇk,superscriptˇ𝜷𝑚1𝑚superscriptsubscript𝑘1𝑚subscriptˇ𝜷𝑘\check{\boldsymbol{\beta}}^{(m)}=\frac{1}{m}\sum_{k=1}^{m}\check{\boldsymbol{% \beta}}_{k},overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , From the central limit theorem, it can be concluded that

rman(𝜷ˇ(m)𝜷^)=1mk=1mran(𝜷ˇk𝜷^)𝑑Np(𝟎,Σ),r,n.formulae-sequence𝑟𝑚subscript𝑎𝑛superscriptˇ𝜷𝑚^𝜷1𝑚superscriptsubscript𝑘1𝑚𝑟subscript𝑎𝑛subscriptˇ𝜷𝑘^𝜷𝑑subscript𝑁𝑝0Σ𝑟𝑛\displaystyle\sqrt{\frac{rm}{a_{n}}}(\check{\boldsymbol{\beta}}^{(m)}-\hat{% \boldsymbol{\beta}})=\frac{1}{\sqrt{m}}\sum_{k=1}^{m}\sqrt{\frac{r}{a_{n}}}(% \check{\boldsymbol{\beta}}_{k}-\hat{\boldsymbol{\beta}})\xrightarrow{d}N_{p}(% \mathbf{0},\Sigma),r,n\rightarrow\infty.square-root start_ARG divide start_ARG italic_r italic_m end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT - over^ start_ARG bold_italic_β end_ARG ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_m end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT square-root start_ARG divide start_ARG italic_r end_ARG start_ARG italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_ARG ( overroman_ˇ start_ARG bold_italic_β end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG bold_italic_β end_ARG ) start_ARROW overitalic_d → end_ARROW italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , roman_Σ ) , italic_r , italic_n → ∞ .

Then the theorem is proved.