Parametric Modal Regression with Error in Covariates

Qingyang Liu
Department of Statistics
University of South Carolina
Columbia, SC 29201
[email protected]
&

Xianzheng Huang
Department of Statistics
University of South Carolina
Columbia, SC 29201
[email protected]

(June 29, 2024)

Abstract

An inference procedure is proposed to provide consistent estimators of parameters in a modal regression model with a covariate prone to measurement error. A score-based diagnostic tool exploiting parametric bootstrap is developed to assess adequacy of parametric assumptions imposed on the regression model. The proposed estimation method and diagnostic tool are applied to synthetic data generated from simulation experiments and data from real-world applications to demonstrate their implementation and performance. These empirical examples illustrate the importance of adequately accounting for measurement error in the error-prone covariate when inferring the association between a response and covariates based on a modal regression model that is especially suitable for skewed and heavy-tailed response data.

Keywords Beta distribution $\cdot$ bootstrap $\cdot$ corrected score $\cdot$ $M$ -estimation $\cdot$ model misspecification

1 Introduction

The mean, median, and mode are three widely used measures of central tendency of data. The mode can be a more informative and sensible central tendency measure than the other two for data arising from distributions that are heavy-tailed and skewed. This very virtue of mode and the ubiquity of heavy-tailed and skewed data in biology, sociology, economics, and many other fields of study have recently revived data scientists’ interest in regression methodology focusing on the conditional mode of a response (Chacón,, 2020).

While there exists an extensive literature on regression models that relate the mean or the median of a response variable $Y$ to covariates $\mathbf{X}$ , there are much less work on regression models tailored for the conditional mode of $Y$ given $\mathbf{X}$ (Sager and Thisted,, 1982; Lee,, 1989, 1993). Among the limited existing modal regression methods, the majority of them are in the semi-/non-parametric framework (Yao and Li,, 2013; Chen et al.,, 2016; Ota et al.,, 2019; Wang et al.,, 2019; Kemp et al.,, 2020; Zhang et al.,, 2021; Ullah et al.,, 2022; Xiang and Yao,, 2022), which typically suffer from low statistical efficiency when compared with their parametric counterparts. One reality that discourages use of parametric models for inferring the mode is that very few named distributions that allow asymmetry can be conveniently formulated as distribution families indexed by the mode along with other parameters. Among the few groups of authors who considered parametric modal regression models, Aristodemou, (2014, Chapter 3) assumed a gamma distribution for a non-negative response with a covariate-dependent mode; Bourguignon et al., (2020) followed a similar model construction while also allowing a covariate-dependent precision parameter for the gamma distribution. Focusing on bounded response data, Zhou and Huang, (2020) proposed two modal regression models, one based on a beta distribution and the other based on a generalized biparabolic distribution for the response given covariates. In all three aforementioned works, frequentist likelihood-based methods are developed to infer model parameters. Most recently, Zhou and Huang, (2022) unified the mean regression and modal regression in a Bayesian framework by reparameterizing a four-parameter beta distribution with an unknown support so that the mean or the mode of $Y$ depends on $\mathbf{X}$ . Earlier works on Bayesian modal regression, including parametric and nonparametric methods, can also be found in Aristodemou, (2014, Chapter 2).

All the above works on modal regression assume that covariates are measured precisely. Data analysts in many disciplines are well aware that, among all variables of interest, some of them often cannot be measured precisely due to inaccurate measuring devices or human error in data collection. Some variables are in principle inaccessible and only some surrogates of them can be measured. For example, one’s long-term blood pressure is an important biomarker associated with one’s heart health, yet it cannot be directly measured. Instead, measurable surrogates of it are blood pressure readings collected during a doctor’s visit, which can be viewed as error-contaminated versions of one’s long-term blood pressure. It has also been well-understood that ignoring covariates measurement error in mean regression or quantile regression usually lead to misleading inference results. There exists a large collection of works on mean regression methodology accounting for measurement error (Carroll et al.,, 2006; Fuller,, 2009; Buonaccorsi,, 2010; Yi,, 2017), and also some works in quantile regression to address this complication (He and Liang,, 2000; Wei and Carroll,, 2009; Wang et al.,, 2012). Modal regression methodology that address this issue only emerged recently, including those developed by Zhou and Huang, (2016), Li and Huang, (2019), and Shi et al., (2021), all of which opted for a nonparametric model for the error term in the primary regression model. There is a lack of methodology to account for error-prone covariates in parametric modal regression, and our study presented in this article fills the void.

In preparation for proposing a method to account for measurement error in covariates that is applicable to any parametric modal regression models, we first formulate the measurement error model and discuss complications unique to modal regression models in Section 2. For concreteness, we then focus on the beta modal regression model for a response supported on [0, 1] with an error-prone covariate, and propose consistent estimation methods to infer model parameters that account for measurement error in Section 3. A model diagnostic method is developed to detect model misspecifications when adopting the beta modal regression model in a given application in Section 4. Simulation studies are reported in Section 5 to demonstrate the performance of the estimation and diagnostics methods. We apply the proposed modal regression method accounting for covariate measurement error to data sets arising from two real-life studies in Section 6, where we also discuss revisions of the method to adapt to more general settings. Section 7 gives concluding remarks and future research directions.

2 Data and Model

2.1 Observed data

Suppose that, given $p$ covariates in $\mathbf{X}=(X_{1},\ldots,X_{p})^{\mathrm{\scriptscriptstyle T}}$ , $Y$ follows a unimodal distribution specified by the probability density function (pdf), $f_{\hbox{$Y|\mathbf{X}$}}(y|\mathbf{x})$ . Denote by $\theta(\mathbf{x})$ the mode of $Y$ given $\mathbf{X}=\mathbf{x}$ . In modal regression without measurement error, one infers $\theta(\mathbf{x})$ based on a random sample of size $n$ from the joint distribution of $(Y,\mathbf{X})$ , $\{(Y_{j},\mathbf{X}_{j})\}_{j=1}^{n}$ , where $\mathbf{X}_{j}=(X_{1,j},\ldots,X_{p,j})^{\mathrm{\scriptscriptstyle T}}$ . Now suppose that a covariate in $\mathbf{X}$ , say, $X_{1}$ , is prone to measurement error, and a surrogate $W$ is observed instead of $X_{1}$ , with $n_{j}$ replicate measures of $X_{1,j}$ in $\widetilde{W}_{j}=\{W_{j,k}\}_{k=1}^{n_{j}}$ , for $j=1,\ldots,n$ . In this study, we assume that $W_{j,k}$ relates to $X_{1,j}$ via an additive measurement error model,

W_{j,k}=X_{1,j}+U_{j,k},\mbox{ for $j=1,\ldots,n$ and $k=1,\ldots,n_{j}$,}

(1)

where $\{U_{j,k},\,k=1,\ldots,n_{j}\}_{j=1}^{n}$ are independent and identically distributed (i.i.d.) mean-zero measurement error, which are independent of $\{(Y_{j},\mathbf{X}_{j})\}_{j=1}^{n}$ to guarantee nondifferential measurement error as considered in the classical measurement error models (Carroll et al.,, 2006, Section 2.5).

In a naive univariate modal regression analysis using the surrogate data, one treats $W$ as if it were $X=X_{1}$ , and equivalently, views the conditional pdf of $Y$ given $W=w$ , $f_{\hbox{$Y|W$}}(y|w)$ , the same as $f_{\hbox{$Y|X$}}(y|w)$ . As a result, naive modal regression analysis essentially infers the mode of $f_{\hbox{$Y|W$}}(y|w)$ instead of $\theta(\cdot)$ . In the context of univariate mean regression models not limited to linear regression, the attenuation effect of measurement error on covariate effect estimation is often noted in the literature (Carroll et al.,, 2006; Buonaccorsi,, 2010), which causes the estimated covariate effect of a truly influential covariate to be pulled towards zero. Naive modal regression can suffer the same attenuation effect. For instance, if the mean and the mode of $f_{\hbox{$Y|X$}}(y|x)$ differ by a quantity that does not depend on covariates, such as for a Gumbel distribution that depends on a covariate $X$ only via the mode but not via the scale parameter, then the impact of measurement error on naive inference for the conditional mean mostly carries over to naive inference for $\theta(x)$ . In other model settings where the conditional mean and mode of $Y$ differ by a quantity that does depend on the error-prone covariate, the effect of measurement error on naive modal regression demands investigation on a case-by-case basis. Even before conducting such investigation, a more fundamental question needs to be addressed, that is whether or not naive modal regression is meaningful, since unimodality of $f_{\hbox{$Y|X$}}(y|x)$ does not guarantee unimodality of $f_{\hbox{$Y|W$}}(y|w)$ . Indeed, there is an extra layer of complication in modal regression with an error-prone covariate that does not exist in mean regression since, if the mean of $Y$ given $X$ , $\mu(X)$ , is well defined, then the mean of $Y$ given $W$ is $E\{\mu(X)|W\}$ , which is also well defined in most settings of practical interest. Because of this additional complication, correcting naive inference to account for measurement error in modal regression is more challenging than the counterpart task in mean regression. For example, a strategy that can be easy to implement in mean regression is to correct the bias in a naive estimator of a parameter to produce an improved estimator accounting for measurement error (Carroll et al.,, 2006, Section 3.4). This idea of de-biasing naive estimation may not be a sensible approach now with the existence of a naive mode function in question.

2.2 Regression model

We propose to account for measurement error when inferring parameters in a modal regression model by exploiting the idea of corrected scores. In particular, we focus on modeling a bounded response $Y$ , which is commonly encountered in practice, such as test scores, disease prevalence, and the fraction of household income spent on food. Any bounded response with a known support can be scaled to be supported on the unit interval [0, 1]. Beta distribution is a parametric family that encompasses various shapes of distributions supported on [0, 1], and thus serves as a relatively flexible basis for building a regression model for such responses. For a random variable $V$ that follows a beta distribution with shape parameters $\alpha_{1},\alpha_{2}>0$ , i.e., $V\sim\mbox{beta}(\alpha_{1},\alpha_{2})$ , its density function is,

f(v;\alpha_{1},\alpha_{2})=\frac{\Gamma(\alpha_{1}+\alpha_{2})}{\Gamma(\alpha_% {1})\Gamma(\alpha_{2})}v^{\alpha_{1}-1}(1-v)^{\alpha_{2}-1},\mbox{ for $0<v<1$,}

where $\Gamma(\cdot)$ is the Gamma function. When $\alpha_{1},\,\alpha_{2}>1$ , this distribution has a unique mode given by $\theta=(\alpha_{1}-1)/(\alpha_{1}+\alpha_{2}-2)$ . To prepare for modal regression, we reparameterize the beta distribution by setting $\alpha_{1}=1+m\theta$ and $\alpha_{2}=1+m(1-\theta)$ , where $m>0$ plays the role of a precision parameter, with a larger value of $m$ leading to a smaller variance of the distribution (Zhou and Huang,, 2020). A similar parameterization of the beta distribution was used in Chen, (1999) to formulate the beta kernel in kernel density estimators, and also in Bagnato and Punzo, (2013) to construct beta mixture distributions. In both earlier works, the beta family is indexed by $\theta$ and a dispersion parameter equal to the reciprocal of $m$ . The parameterization of beta distributions used in our study is also in line with the one in Kruschke, (2015, see Equation (6.6)), except for that a concentration parameter equal to our $m$ plus 2 is used in place of our precision parameter there. Despite these small differences, all aforementioned parameterizations hightlight the mode as the location parameter, with the original shape parameters $\alpha_{1}$ and $\alpha_{2}$ specified by the mode and a precision/concentration/dispersion parameter that is of secondary interest in drawing inference. By construction, as long as the mode $\theta\in(0,1)$ exists, which we assume throughout the study, we have $\alpha_{1},\alpha_{2}>1$ following our parameterization.

With a beta distribution family indexed by $(\theta,m)$ formulated, a beta modal regression model follows by introducing covariates-dependent mode of $Y$ , $\theta{(\mathbf{X})}=g(\mbox{\boldmath$\beta$}^{\mathrm{\scriptscriptstyle T}}% \tilde{\mathbf{X}})$ , where $\tilde{\mathbf{X}}=(1,\mathbf{X}^{\mathrm{\scriptscriptstyle T}})^{\mathrm{% \scriptscriptstyle T}}$ , $\mbox{\boldmath$\beta$}=(\beta_{0},\beta_{1},\ldots,\beta_{p})^{\mathrm{% \scriptscriptstyle T}}$ with $\beta_{0}$ being the intercept and $\beta_{1},\ldots,\beta_{p}$ representing covariate effects associated with the $p$ covariates in $\mathbf{X}$ , and $g(\cdot)$ is a user-specified link function, such as logit, probit, log-log, and complementary log-log. Now a modal regression model for $Y$ is fully specified by the following conditional distribution of $Y$ given $\mathbf{X}$ ,

Y|\mathbf{X}\sim\text{beta}(1+m\theta(\mathbf{X}),\,1+m\{1-\theta(\mathbf{X})% \}).

(2)

Combining (2) with (1) completes the specification of a modal regression model for a response $Y$ supported on [0, 1] and covariates $\mathbf{X}=(X_{1},\ldots,X_{p})^{\mathrm{\scriptscriptstyle T}}$ , with $X_{1}$ subject to additive nondifferential measurement error. The focal point of inference lies in parameters involved in the primary regression model in (2), $\mbox{\boldmath$\Omega$}=(\mbox{\boldmath$\beta$}^{\mathrm{\scriptscriptstyle T% }},m)^{\mathrm{\scriptscriptstyle T}}$ . Parameters appearing in (1) are of secondary interest but required to specify the measurement error distribution.

3 Parameter estimation

3.1 Maximum likelihood estimation

In the absence of measurement error, one may carry out maximum likelihood estimation of $\Omega$ straightforwardly by solving the normal score equations for $\Omega$ . More specifically, the log-likelihood of error-free data, $\mathcal{D}=\{(Y_{j},\mathbf{X}_{j})\}_{j=1}^{n}$ , is

$\displaystyle\ell(\mathbf{\Omega};\mathcal{D})=$	$\displaystyle\ \sum_{j=1}^{n}\ell(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$	(3)
$\displaystyle=$	$\displaystyle\ n\log\Gamma(2+m)-\sum_{j=1}^{n}\log\left(\Gamma(1+m\theta\left(% \mathbf{X}_{j}\right))\Gamma(1+m\left\{1-\theta\left(\mathbf{X}_{j}\right)% \right\})\right)$
	$\displaystyle+m\sum_{j=1}^{n}\left[\theta\left(\mathbf{X}_{j}\right)\log Y_{j}% +\left\{1-\theta\left(\mathbf{X}_{j}\right)\right\}\log\left(1-Y_{j}\right)% \right].$

Differentiating (3) with respect to $\Omega$ leads to the score equations, $\sum_{j=1}^{n}\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},% \mathbf{X}_{j})=\mbox{\boldmath$0$}$ , where the score vector evaluated at the $j$ -th data point, $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ , consists of the following scores, for $j=1,\ldots,n$ ,

$\displaystyle\frac{\partial\ell(\boldsymbol{\Omega};Y_{j},\mathbf{X}_{j})}{% \partial\mbox{\boldmath$\beta$}}=$	$\displaystyle\ \left\{-m\psi(1+m\theta(\mathbf{X}_{j}))+m\psi(1+m\{1-\theta(% \mathbf{X}_{j})\})+m\log\left(\frac{Y_{j}}{1-Y_{j}}\right)\right\}$
	$\displaystyle\ \times g^{\prime}(\mbox{\boldmath$\beta$}^{\mathrm{% \scriptscriptstyle T}}\tilde{\mathbf{X}}_{j})\tilde{\mathbf{X}}_{j},$	(4)
$\displaystyle\frac{\partial\ell(\boldsymbol{\Omega};Y_{j},\mathbf{X}_{j})}{% \partial m}=$	$\displaystyle\ \psi(2+m)-\theta(\mathbf{X}_{j})\psi(1+m\theta(\mathbf{X}_{j}))% -\{1-\theta(\mathbf{X}_{j})\}\psi(1+m\{1-\theta(\mathbf{X}_{j})\})$
	$\displaystyle\ +\theta(\mathbf{X}_{j})\log Y_{j}+\{1-\theta(\mathbf{X}_{j})\}% \log(1-Y_{j}),$	(5)

where $\psi(t)=(d/dt)\log\Gamma(t)$ is the digamma function and $g^{\prime}(t)=(d/dt)g(t)$ .

3.2 Monte-Carlo corrected scores

In the presence of measurement error, a naive estimator of $\Omega$ solves the naive score equations resulting from replacing $X_{1,j}$ with $\overline{W}_{j}=n_{j}^{-1}\sum_{k=1}^{n_{j}}W_{j,k}$ in (4) and (5), for $j=1,\ldots,n$ . As pointed out earlier and also evidenced in simulation study to be presented later, this naive treatment typically results in misleading inference for $\Omega$ . We propose to follow the idea of the corrected score method (Nakamura,, 1990) and revise the naive scores to obtain estimating equations that adequately account for measurement error. The thrust of the corrected score method is to use the observed error-prone data, $\mathcal{D}^{*}=\{(Y_{j},\widetilde{W}_{j},\,\mathbf{X}_{-1,j})\}_{j=1}^{n}$ with $\widetilde{W}_{j}=\{W_{j,k}\}_{k=1}^{n_{j}}$ and $\mathbf{X}_{-1,j}=(X_{2,j},\ldots,X_{p,j})^{\mathrm{\scriptscriptstyle T}}$ , to construct unbiased estimators of the above normal scores. In this vein of thinking, one treats $\{X_{1,j}\}_{j=1}^{n}$ as unknown parameters instead of realizations of a random variable, and thus one takes on the functional point of view as opposed to the structural viewpoint of measurement error models where a distribution for $X_{1}$ is assumed (Carroll et al.,, 2006, Section 2.1).

We begin with applying the Monte-Carlo-amenable method proposed by Stefanski et al., (2005), a method originating from the idea described in Stefanski, (1989). More specifically, we construct a score, $\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},% \mathbf{X}_{-1,j})$ , that satisfies $E\{\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},% \mathbf{X}_{-1,j})|Y_{j},\mathbf{X}_{j}\}=\mbox{\boldmath$\Psi$}_{0}(\mbox{% \boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ , for $j=1,\ldots,n$ . This particular method is especially suitable for settings with a univariate error-prone covariate subject to normal measurement error $U$ . We will address violation of the normality assumption on $U$ in Section 3, and describe revisions of the method to adapt to settings with multiple error-prone covariates in Section 6. As shown in Stefanski et al., (2005, Theorem 1), the minimum variance unbiased estimator of $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ is given by

\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},% \mathbf{X}_{-1,j})=E\left\{\left.\mbox{\boldmath$\Psi$}_{0}\left(\mbox{% \boldmath$\Omega$};Y_{j},\overline{W}_{j}+i\sqrt{\frac{(n_{j}-1)S_{j}^{2}}{n_{% j}}}T,\mathbf{X}_{-1,j}\right)\right|Y_{j},\overline{W}_{j},S_{j}^{2},\mathbf{% X}_{-1,j}\right\},

(6)

where $i$ is the imaginary unit, $S_{j}^{2}$ is the sample variance of $\widetilde{W}_{j}=\{W_{j,k}\}_{k=1}^{n_{j}}$ , and $T=Z_{1}/(\sum_{k=1}^{n_{j}-1}Z^{2}_{k})^{1/2}$ is independent of all observed data, in which $Z_{1},\ldots,Z_{n_{j}-1}$ are independent standard normal random variables. The estimator of $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ in (6) originates from a jackknife exact-extrapolant estimator constructed for the purpose of estimating a function of the mean of a normal distribution based on a random sample from the distribution. In the context of (6), this random sample is $\widetilde{W}_{j}$ from $N(X_{1,j},\sigma_{u}^{2})$ , where $\sigma_{u}^{2}$ is the measurement error variance, i.e., assuming $U\sim N(0,\sigma_{u}^{2})$ in (1), and the function of the normal mean $X_{1,j}$ is $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},X_{1,j},\mathbf{X}_{% -1,j})$ . The expectation in (6) cannot be derived in closed form. But since the only quantity viewed as random when deriving this conditional expectation is $T$ that is independent of observed data, one can estimate this expectation unbiasedly via an empirical mean based on simulated random samples of $T$ . Moreover, as shown in Stefanski et al., (2005), even though (6) is complex-valued by construction, the expectation of its imaginary part is zero as long as $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},X_{1,j},\mathbf{X}_{% -1,j})$ is infinitely differentiable with respect to $X_{1,j}$ , which is guaranteed in our case by choosing a link function $g(t)$ that is infinitely differentiable. Hence, using the real part of the empirical version of (6) suffices for constructing an unbiased estimator of $\mbox{\boldmath$\Psi$}_{0}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ . This leads to the following corrected score based on a simulated random sample of $T$ of size $B$ , $\widetilde{T}_{j}=\{T_{j,b}\}_{b=1}^{B}$ , for $j=1,\ldots,n$ ,

\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},% \widetilde{T}_{j},\mathbf{X}_{-1,j})=\frac{1}{B}\sum_{b=1}^{B}\mbox{Re}\left\{% \mbox{\boldmath$\Psi$}_{0}\left(\mbox{\boldmath$\Omega$};Y_{j},\overline{W}_{j% }+i\sqrt{\frac{(n_{j}-1)S_{j}^{2}}{n_{j}}}T_{j,b},\mathbf{X}_{-1,j}\right)% \right\},

(7)

where $\mbox{Re}(t)$ denotes the real part of a complex-valued $t$ .

One now can solve the following system of $p+2$ equations based on the corrected score in (7),

\sum_{j=1}^{n}\mbox{\boldmath$\Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde% {W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j})=\mbox{\boldmath$0$},

(8)

for $\Omega$ to obtain a consistent estimator $\hat{}\mbox{\boldmath$\Omega$}$ , where $\widetilde{T}_{1},\ldots,\widetilde{T}_{n}$ are independent. Solving (8) for $\Omega$ is equivalent to solving an optimization problem, that is,

\hat{\mathbf{\Omega}}=\underset{\mathbf{\Omega}\in\mathbb{R}^{p+1}\times% \mathbb{R}^{+}}{\arg\min}\left\{\sum_{j=1}^{n}\mbox{\boldmath$\Psi$}(\mbox{% \boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j}% )\right\}^{\mathrm{\scriptscriptstyle T}}\left\{\sum_{j=1}^{n}\mbox{\boldmath$% \Psi$}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde{T}_{j},% \mathbf{X}_{-1,j})\right\}.

(9)

The equivalence between (9) and the solution to (8) is obvious when there exists a unique solution to (8). An added benefit of dealing with an optimization problem is more appreciated in the presence of model misspecification that can potentially lead to non-existence of a solution to (8), yet (9) may still be well-defined with meaningful statistical interpretations according to White, (1982).

3.3 Monte-Carlo corrected log-likelihood

To this end, estimating $\Omega$ appears to be a straightforward optimization problem. But the numerical procedure to obtain (9) requires evaluating $p+2$ scores at each iteration, which can be cumbersome and very demanding on the computer memory and central processing unit, especially due to the Monte Carlo nature of the score in (7) that involves computing a vector-valued score $B$ times. Viewing the quadratic form in (9) as an objective function that accounts for measurement error, we propose to use a different objective function that also takes measurement error into account and is computationally less cumbersome to optimize. This new objective function is obtained by correcting the naive log-likelihood function $\ell(\mbox{\boldmath$\Omega$};Y_{j},\overline{W}_{j},\mathbf{X}_{-1,j})$ that is the summand of (3) with $X_{1,j}$ evaluated at $\overline{W}_{j}$ , for $j=1,\ldots,n$ . Similar to the construction of the corrected score in (7) based on the naive score, the new objective function based on the naive log-likelihood evaluated at the $j$ -th observed data point is

\displaystyle\tilde{\ell}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},% \widetilde{T}_{j},\mathbf{X}_{-1,j})

\displaystyle=\frac{1}{B}\sum_{b=1}^{B}\mbox{Re}\left\{\ell\left(\mbox{% \boldmath$\Omega$};Y_{j},\overline{W}_{j}+i\sqrt{\frac{(n_{j}-1)S_{j}^{2}}{n_{% j}}}T_{j,b},\mathbf{X}_{-1,j}\right)\right\},

(10)

which satisfies $E\{\tilde{\ell}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde{T}% _{j},\mathbf{X}_{-1,j})|Y_{j},\mathbf{X}_{j}\}=\ell(\mbox{\boldmath$\Omega$};Y% _{j},\mathbf{X}_{j})$ , for $j=1,\ldots,n$ . We then define an estimator of $\Omega$ as

\hat{\mathbf{\Omega}}=\underset{\mathbf{\Omega}\in\mathbb{R}^{p+1}\times% \mathbb{R}^{+}}{\arg\max}\sum_{j=1}^{n}\tilde{\ell}(\mbox{\boldmath$\Omega$};Y% _{j},\widetilde{W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j}),

(11)

which only requires repeated evaluation of a scalar function in (10) at each iteration of an optimization algorithm. In simulation studies (not presented in this article) where we estimate $\Omega$ using these two routes of optimization according to (9) and (11), we obtain very similar estimates of $\Omega$ , with the former route more computationally demanding than the latter. The numerical similarity of (9) and (11) may be expected given the connection between the naive score and the naive log-likelihood, in addition to the equivalence between the solution to the normal score equation and the maximum likelihood estimator in the absence of measurement error. We refer to the estimator defined in (11) the Monte Carlo corrected log-likelihood estimator, or MCCL for short.

Whether one follows the idea of correcting the naive scores or the route of correcting the naive log-likelihood to account for measurement error, our proposed estimation method falls in the general framework of $M$ -estimation (Boos and Stefanski,, 2013, Chapter 7). As an $M$ -estimator, the MCCL estimator $\hat{}\mbox{\boldmath$\Omega$}$ is a consistent estimator of $\Omega$ that is asymptotically normal under regularity conditions stated in, for example, Theorem 7.2 in Boos and Stefanski, (2013). Moreover, motivated by its asymptotic variance of the sandwich form (Boos and Stefanski,, 2013, Section 7.2.1), the variance of $\hat{}\mbox{\boldmath$\Omega$}$ can be estimated by

\mathbf{V}(\mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})=\left\{\mathbf{A}(% \mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})\right\}^{-1}\mathbf{B}(% \mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})\left[\left\{\mathbf{A}(% \mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})\right\}^{-1}\right]^{\mathrm{% \scriptscriptstyle T}},

(12)

where

	$\displaystyle\mathbf{A}(\mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})$	$\displaystyle=\left.\frac{1}{n}\sum_{j=1}^{n}\frac{\partial}{\partial\mbox{% \boldmath$\Omega$}^{\mathrm{\scriptscriptstyle T}}}\mbox{\boldmath$\Psi$}(% \mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde{T}_{j},\mathbf{X}_% {-1,j})\right\|_{\mbox{\boldmath$\Omega$}=\hat{}\mbox{\boldmath$\Omega$}},$
	$\displaystyle\mathbf{B}(\mathcal{D}^{*};\hat{}\mbox{\boldmath$\Omega$})$	$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\mbox{\boldmath$\Psi$}(\hat{}\mbox{% \boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j}% )\left\{\mbox{\boldmath$\Psi$}(\hat{}\mbox{\boldmath$\Omega$};Y_{j},\widetilde% {W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j})\right\}^{\mathrm{% \scriptscriptstyle T}}.$

4 Model diagnostics

Even though we avoid specifying the true covariate distribution by adopting the functional viewpoint of measurement error models, the primary regression model in (2) is fully parametric. This raises the concern of model misspecification and calls for model diagnostics tools. Model diagnostics based on error-prone data is more challenging than settings without measurement error. In particular, conventional residual-based diagnostics methods that require evaluating an estimated regression function, whether it is the conditional mean $\mu(\mathbf{X})$ in mean regression or the conditional mode $\theta(\mathbf{X})$ in modal regression, are no longer applicable now that a true covariate is unobserved. Another contribution of our study is an effective score-based diagnostic tool that circumvents this obstacle a traditional residual-based diagnostic method faces in the presence of measurement error.

For the beta modal regression model without error in covariates, Zhou and Huang, (2020) propose a score-based test statistic defined below for the purpose of model diagnostics,

Q(\hat{\mathbf{\Omega}}_{0};\mathcal{D})=\frac{n-2}{2(n-1)}\overline{\mathbf{S% }}^{\mathrm{\scriptscriptstyle T}}\hat{\mathbf{\Sigma}}^{-1}\overline{\mathbf{% S}},

(13)

where $\hat{}\mbox{\boldmath$\Omega$}_{0}$ is the maximum likelihood estimator of $\Omega$ , $\overline{\mathbf{S}}=n^{-1}\sum_{j=1}^{n}\mathbf{S}(\hat{\mathbf{\Omega}}_{0}% ;Y_{j},\mathbf{X}_{j})$ , and $\hat{\boldsymbol{\Sigma}}=\{n(n-1)\}^{-1}\sum_{j=1}^{n}\{\mathbf{S}(\hat{% \boldsymbol{\Omega}}_{0};Y_{j},\mathbf{X}_{j})-\overline{\mathbf{S}}\}\{% \mathbf{S}(\hat{\boldsymbol{\Omega}}_{0};Y_{j},\mathbf{X}_{j})-\overline{% \mathbf{S}}\}^{\mathrm{\scriptscriptstyle T}}$ , in which, for $j=1,\ldots,n$ ,

\displaystyle\mathbf{S}(\boldsymbol{\Omega};Y_{j},\mathbf{X}_{j})

\displaystyle=\begin{bmatrix}\log Y_{j}-\psi(1+m\theta(\mathbf{X}_{j}))+\psi(2% +m)\\ \displaystyle{Y_{j}\log Y_{j}-\frac{\{1+m\theta(\mathbf{X}_{j})\}\{\psi(2+m% \theta(\mathbf{X}_{j}))-\psi(3+m)\}}{2+m}}\end{bmatrix}

(14)

is the score vector constructed by matching $\log V$ and $V\log V$ with their respective expectations for $V\sim\mbox{beta}(\alpha_{1},\alpha_{2})$ , and thus $E\{\mathbf{S}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})\}=\mbox{\boldmath% $0$}$ in the absence of model misspecification. By construction, a larger value of the nonnegative $Q(\hat{}\mbox{\boldmath$\Omega$}_{0};\mathcal{D})$ provides stronger evidence indicating model misspecification. A parametric bootstrap procedure is developed in Zhou and Huang, (2020) to estimate the null distribution of $Q(\hat{}\mbox{\boldmath$\Omega$}_{0};\mathcal{D})$ , from which one onbtains an estimated $p$ -value for the test.

Returning to our beta modal regression model with error-in-covariate, we apply the idea of corrected score here to construct a counterpart of (14) to obtain a score accounting for measurement error whose mean is zero in the absence of model misspecification. This yields the corrected score evaluated at the $j$ -th observed data point for model diagnostics, for $j=1,\ldots,n,$

\tilde{\mathbf{S}}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde% {T}_{j},\mathbf{X}_{-1,j})=\frac{1}{B}\sum_{b=1}^{B}\mbox{Re}\left\{\mathbf{S}% \left(\mbox{\boldmath$\Omega$};Y_{j},\overline{W}_{j}+i\sqrt{\frac{(n_{j}-1)S_% {j}^{2}}{n_{j}}}T_{j,b},\mathbf{X}_{-1,j}\right)\right\}.

(15)

The test statistic of the quadratic form denoted by $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ that is parallel to (13) follows by using the MCCL estimator $\hat{}\mbox{\boldmath$\Omega$}$ instead of $\hat{}\mbox{\boldmath$\Omega$}_{0}$ , replacing $\overline{\mathbf{S}}$ appearing in (13) with $n^{-1}\sum_{j=1}^{n}\tilde{\mathbf{S}}(\mbox{\boldmath$\Omega$};Y_{j},% \widetilde{W}_{j},\widetilde{T}_{j},\mathbf{X}_{-1,j})$ , and revising $\hat{}\mbox{\boldmath$\Sigma$}$ accordingly. But the next hurdle emerges, that is the design of a parametric bootstrap procedure for estimating the null distribution of $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ . Traditional parametric bootstrap in the regression setting, such as the procedure in Zhou and Huang, (2020), involves generating response data from the primary regression model that again requires evaluating an estimated regression function at the true covariates that are partly unobserved in the current context. We overcome this hurdle by “estimating” unobserved true covariate data, as implemented in the method of regression calibration (Chapter 4, Carroll et al.,, 2006) that takes on the structural viewpoint of measurement error models. Under the classical measurement error in (1), the best linear predictor of $X_{1,j}$ is $E(X_{1,j}|\overline{W}_{j})=\mu_{1}+\lambda_{j}(\overline{W}_{j}-\mu_{1})$ , where $\mu_{1}=E(X_{1})$ and $\lambda_{j}=n_{j}\sigma^{2}_{1}/\sigma^{2}_{\hbox{$W$}}$ is the reliability ratio associated with $\overline{W}_{j}$ (Carroll et al.,, 2006, Section 3.2.1), in which $\sigma^{2}_{1}$ and $\sigma^{2}_{\hbox{$W$}}$ denote the variance of $X_{1}$ and that of $W$ , respectively. Replacing each unknown quantity in $E(X_{1,j}|\overline{W}_{j})$ with its method-of-moments estimator yields an “estimator" or prediction of $X_{1,j}$ given by

\hat{X}_{1,j}^{*}=\overline{W}+\hat{\lambda}(\overline{W}_{j}-\overline{W}),% \text{ for $j=1,\ldots,n$,}

(16)

where $\overline{W}=n^{-1}\sum_{j=1}^{n}\overline{W}_{j}$ and $\hat{\lambda}=\hat{\sigma}^{2}_{1}/\hat{\sigma}^{2}_{\hbox{$W$}}$ , in which $\hat{\sigma}^{2}_{\hbox{$W$}}$ is the sample variance of $(\overline{W}_{1},\ldots,\overline{W}_{n})$ , $\hat{\sigma}^{2}_{1}=(\hat{\sigma}^{2}_{\hbox{$W$}}-\hat{\sigma}^{2}_{u})_{+}$ , and $\hat{\sigma}_{u}^{2}=n^{-1}\sum_{j=1}^{n}S_{j}^{2}/n_{j}$ , recalling that, for $j=1,\ldots,n$ , $S_{j}^{2}$ is the sample variance of $(W_{j,1},\ldots,W_{j,n_{j}})$ computed earlier to evaluate the corrected score and the corrected log-likelihood. The idea of regression calibration is to regress $Y$ on the estimated covariate $\hat{X}^{*}_{1}$ defined by (16) and $\mathbf{X}_{-1}=(X_{2},\ldots,X_{p})^{\mathrm{\scriptscriptstyle T}}$ instead of regressing on $(W,\mathbf{X}_{-1}^{\mathrm{\scriptscriptstyle T}})^{\mathrm{% \scriptscriptstyle T}}$ . Even though this idea often yields estimators of parameters in the primary regression model improved over naive estimators, Buonaccorsi et al., (2018) noted that (16) tends to underestimate the variability of the true covariate and thus can be problematic if used in a bootstrap procedure as we intend to. They then proposed to use

\hat{X}_{1,j}=\overline{W}+\hat{\lambda}^{1/2}(\overline{W}_{j}-\overline{W}),% \text{ for $j=1,\ldots,n$,}

(17)

as estimated covariate data instead so that these estimated covariate values have the mean and variance coinciding with method-of-moments estimates for the mean and variance of $X_{1}$ .

With this last hurdle resolved, we are in the position to present the detailed algorithm of the parametric bootstrap for estimating the $p$ -value associated with $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ based on $M$ bootstrap samples next.

Step 1: Fit the beta modal regression model with classical measurement error to $\mathcal{D}^{*}$ by applying the MCCL method in Section 3.3. This gives the MCCL estimate $\hat{}\mbox{\boldmath$\Omega$}=(\hat{}\mbox{\boldmath$\beta$}^{\mathrm{% \scriptscriptstyle T}},\hat{m})^{\mathrm{\scriptscriptstyle T}}$ .
Step 2: Compute the test statistic $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ .
For $d=1,\ldots,M$ , repeat Steps 3–5,
Step 3: For $j=1,\ldots,n$ , generate $Y^{(d)}_{j}$ from beta $(1+\hat{m}\hat{\theta}(\hat{X}_{1,j},\mathbf{X}_{-1,j}),\,1+\hat{m}\{1-\hat{% \theta}(\hat{X}_{1,j},\mathbf{X}_{-1,j})\})$ , and generate $W^{(d)}_{j,k}=\hat{X}_{1,j}+U^{(d)}_{j,k}$ , for $k=1$ , …, $n_{j}$ , where $\hat{X}_{1,j}$ is given by (17), and $\{U^{(d)}_{j,k}\}_{k=1}^{n_{j}}$ are i.i.d. from $N(0,S_{j}^{2})$ . Let $\widetilde{W}_{j}^{(d)}=\{W^{(d)}_{j,k}\}_{k=1}^{n_{j}}$ . This yields the $d$ -th set of bootstrap data, $\mathcal{D}^{(d)}=\{(Y^{(d)}_{j},\,\widetilde{W}^{(d)}_{j},\mathbf{X}_{-1,j})% \}_{j=1}^{n}$ .
Step 4: Fit the beta modal regression model with classic measurement error to $\mathcal{D}^{(d)}$ , and obtain the MCCL estimate of $\Omega$ , denoted by $\hat{\boldsymbol{\Omega}}^{(d)}$ .
Step 5: Compute the test statistic, $\tilde{Q}(\hat{\boldsymbol{\Omega}}^{(d)};\mathcal{D}^{(d)})$ .
Step 6: Estimate the $p$ -value by $M^{-1}\sum_{d=1}^{M}I\left\{\tilde{Q}\left(\hat{\boldsymbol{\Omega}}^{(d)};% \mathcal{D}^{(d)}\right)>\tilde{Q}\left(\hat{\boldsymbol{\Omega}};\mathcal{D}^% {*}\right)\right\}$ .

In the absence of covariate measurement error where $\{X_{1,j},\,j=1,\ldots,n\}$ are observed, the above algorithm (with $\hat{X}_{1,j}$ replaced by $X_{1,j}$ in Step 3) essentially follows the general guidelines of bootstrap hypothesis testing as discussed in Hall and Wilson, (1991), Davison and Hinkley, (1997), and Martin, (2007). In particular, our targeted null hypothesis states that the response given true covariates follows a beta modal regression model; Step 1 in our bootstrap algorithm aims to “recover" the model consistent with the null, and response data obtained in Step 3 are generated from the fitted null model and thus these response data reflect the null. This is precisely the first principle of model-based bootstrap for hypothesis testing: to generate bootstrap data that reflect the null. The unique challenge of bootstrap hypothesis testing in the presence of covariate measurement error is that true covariate values need to be estimated before generating response data. Unlike response data generation, which should reflect the null (that does not specify a distribution for the true covariate data), when “recovering" true covariate values, one aims to recover certain structures of the design matrix in the absence of measurement error. We accomplish this goal by using $\{\hat{X}_{1,j},\,j=1,\ldots,n\}$ in (17), which preserve certain structures of true covariate values in the sense that the first two moments of these estimated covariate values coincide with the method-of-moment estimates for the first two moments of $\{X_{1,j},\,j=1,\ldots,n\}$ . The so-constructed estimated true covariate values are also used in Thomas et al., (2011) to recover true covariate data. Even though it is unclear if there exists a better way to recover error-free covariates data for the purpose of bootstrap hypothesis testing, Buonaccorsi et al., (2016) showed that this approach substantially outperforms two obvious alternative methods: one is to use $\overline{W}_{j}$ to estimate $X_{1,j}$ , the other is to use $\hat{X}_{1,j}^{*}$ in (16). In our context, empirical evidence from the simulation study presented in the next section suggest that the proposed bootstrap procedure can estimate the null distribution of $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ accurately enough to preserve the right size of the test for model misspecification over a wide range of significance levels.

5 Simulation study

We carry out simulation study to inspect finite sample performance of the proposed estimation method and the diagnostic method. The source code to reproduce results in this section is publicly available on the journal’s web page.

5.1 Design of simulation experiments

We generate data from each of the following four data generation processes.

(M1)

Generate response data according to (2), with $m=3$ , $\theta(\mathbf{X})=1/\{1+\exp(-\beta_{0}-\beta_{1}X_{1}-\beta_{2}X_{2})\}$ , $\mbox{\boldmath$\beta$}=(\beta_{0},\beta_{1},\beta_{2})^{\mathrm{% \scriptscriptstyle T}}=(0.25,0.25,0.25)^{\mathrm{\scriptscriptstyle T}}$ , $X_{2}\sim\text{Bernoulli}(0.5)$ , and $X_{1}|X_{2}\sim N(I(X_{2}=1)-I(X_{2}=0),\,1)$ , where $I(\cdot)$ is the indicator function. Contaminate data of $X_{1}$ according to (1) to generate $W_{j,k}$ , for $j=1,\ldots,n$ and $k=1,2,3$ , with $U_{j,k}\sim N(0,\sigma^{2}_{u})$ .
(M2)

Same as (M1) except for that $m=40$ and $\theta(\mathbf{X})=1/\{1+\exp(-\beta_{0}-\beta_{1}X_{1}-\beta_{2}X_{2}-\beta_{% 3}X_{1}^{2})\}$ , with $\mbox{\boldmath$\beta$}=(\beta_{0},\beta_{1},\beta_{2},\beta_{3})^{\mathrm{% \scriptscriptstyle T}}=(1,1,1,1)^{\mathrm{\scriptscriptstyle T}}$ .
(M3)

Same as (M1) except for that $\theta\left(\mathbf{X}\right)=\Phi\left(\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2% }\right)$ with $\mbox{\boldmath$\beta$}=(\beta_{0},\beta_{1},\beta_{2})^{\mathrm{% \scriptscriptstyle T}}=(1,1,1)^{\mathrm{\scriptscriptstyle T}}$ , where $\Phi(\cdot)$ is the cumulative distribution function of $N(0,1)$ .
(M4)

Generate response data $\{Y_{j}\}_{j=1}^{n}$ according to $Y_{j}=(Y_{j}^{*}-Y^{*}_{(1)})/(Y^{*}_{(n)}-Y^{*}_{(1)})$ , for $j=1,\ldots,n$ , where $Y^{*}_{(1)}$ and $Y^{*}_{(n)}$ are the minimum and maximum order statistics of data $\{Y^{*}_{j}\}_{j=1}^{n}$ , respectively, $Y^{*}_{j}\mid\mathbf{X}_{j}\sim\operatorname{Gumbel}(\theta(\mathbf{X}_{j}),\,% \gamma^{-1}\{1-2\theta(\mathbf{X}_{j})\}/(2+m))$ , in which $\theta(\mathbf{X}_{j})<0.5$ is the mode formulated as that in (M1) with $\mbox{\boldmath$\beta$}=(\beta_{0},\beta_{1},\beta_{2})=(1,1,1)^{\mathrm{% \scriptscriptstyle T}}$ , $\gamma^{-1}\{1-2\theta(\mathbf{X}_{j})\}/(2+m)$ is the scale of the Gumbel distribution, and $\gamma$ stands for the Euler–Mascheroni constant.

Despite the data generation process used to generate a particular data set, we always assume a beta modal regression model with $\theta(\mathbf{X})$ specified as that in (M1) when carrying out modal regression analysis of $Y$ on $\mathbf{X}=(X_{1},X_{2})^{\mathrm{\scriptscriptstyle T}}$ . By so doing, the design in (M1) allows us to monitor point estimation in the absence of model misspecification, and the latter three designs can be used to study operating characteristics of the proposed model diagnostic method in the presence of different sources of model misspecification. In particular, fitting the assumed model to data generated according to (M2) creates a scenario where one misspecifies the linear predictor in the regression function. When data are generated from (M3), the assumed model has a wrong link function. Finally, fitting the assumed model to data from (M4) gives rise to the most severe model misspecification in the sense that the true distribution of $Y$ given $\mathbf{X}$ is outside of the beta family.

5.2 Performance of point estimation

Besides assessing the quality of the MCCL estimator of $\Omega$ in comparison with the naive maximum likelihood estimator, we aim at addressing the following three issues of point estimation in the simulation study: (i) the impact of having an error-free covariate along with an error-prone covariate on covariate effects estimation; (ii) the quality of the variance estimation based on (12); (iii) the robustness of the MCCL estimator to the normality assumption on $U$ . We bring up the third issue because the corrected score method is developed under the assumption of normal measurement error. Due to our focus on covariate effects estimation in the presence of an error-prone covariate in a modal regression model for a bounded response, none of the existing modal regression methods accounting for measurement error referenced in Section 1 serves as a sensible competing method in the current simulation study (e.g., there is no covariate effect parameters $\beta$ in a nonparametric modal regression model) .

Based on data generated according to (M1) with $\sigma^{2}_{u}=0.6,1.2$ , we obtain the MCCL estimate of $\Omega$ using $B=100,200$ and the naive maximum likelihood estimate that ignores measurement error in $X_{1}$ . Table 1 provides the median of MCCL estimates $\hat{}\mbox{\boldmath$\Omega$}$ and the median of naive estimates across 1000 Monte Carlo replicates at each of the two sample sizes $n=100,200$ . In contrast to the naive estimates that exhibit bias that do not diminish as the sample size increases, the MCCL estimates are much improved despite the severity of error contamination in $X_{1}$ . And raising $B$ from 100 to 200 provides negligible improvement in the qualitiy of MCCL estimates. We thus set $B=100$ in the remaining empirical study and only show results corresponding to this default choice of $B$ in the sequel. Not surprisingly, the MCCL estimator corrects the bias of the naive estimator at the price of an inflation in variation.

Table 1: Medians of MCCL estimates and medians of naive estimates across

1000

Monte Carlo replicates generated according to (M1). The number in parentheses following each median is the interquartile range of the 1000 realizations of an estimator.

		$\beta_{0}$	$\beta_{1}$	$\beta_{2}$	$\log m$
		$\sigma^{2}_{u}=0.6$
	$\text{MCCL}_{B=100}$	0.23 (0.34)	0.24 (0.22)	0.26 (0.59)	1.18 (0.30)
	$\text{MCCL}_{B=200}$	0.23 (0.35)	0.24 (0.22)	0.26 (0.59)	1.18 (0.30)
$n=100$	Naive	0.19 (0.31)	0.20 (0.18)	0.35 (0.55)	1.16 (0.29)
	$\text{MCCL}_{B=100}$	0.24 (0.23)	0.25 (0.15)	0.26 (0.40)	1.14 (0.22)
	$\text{MCCL}_{B=200}$	0.24 (0.23)	0.25 (0.15)	0.26 (0.40)	1.14 (0.22)
$n=200$	Naive	0.20 (0.22)	0.20 (0.13)	0.34 (0.37)	1.13 (0.21)
		$\sigma^{2}_{u}=1.2$
	$\text{MCCL}_{B=100}$	0.24 (0.34)	0.24 (0.24)	0.27 (0.65)	1.18 (0.30)
	$\text{MCCL}_{B=200}$	0.24 (0.35)	0.24 (0.24)	0.26 (0.66)	1.18 (0.30)
$n=100$	Naive	0.17 (0.31)	0.17 (0.17)	0.41 (0.54)	1.16 (0.29)
	$\text{MCCL}_{B=100}$	0.25 (0.25)	0.25 (0.18)	0.25 (0.43)	1.14 (0.21)
	$\text{MCCL}_{B=200}$	0.25 (0.25)	0.25 (0.18)	0.26 (0.43)	1.14 (0.21)
$n=200$	Naive	0.17 (0.21)	0.17 (0.12)	0.41 (0.36)	1.12 (0.21)

The attenuation effect of measurement error on the naive covariate effect estimation for $X_{1}$ is evident in Table 1. In contrast, the covariate effect estimation for the error-free covariate $X_{2}$ is noticeably overestimated by the naive method. One may wonder if the observed opposite directions in the bias of naive estimation of two covariates effects persists when the two covariates are independent. This relates to the first issue brought up above. To address this issue, we revise the data generating process in (M1) in that $X_{1}\sim N(0,1)$ . Figure 1 includes boxplots of two sets of regression coefficients estimates, including the MCCL estimates and the naive estimates, under (M1) where $X_{1}$ and $X_{2}$ are dependent (see the left panel in Figure 1) and under the revised (M1) with $X_{1}$ and $X_{2}$ independent (see the right panel in Figure 1). Here, we set $n=2000$ for each of 1000 Monte Carlo replicates. Interestingly, when $X_{2}$ is independent of the error-prone covariate $X_{1}$ , naive estimation for the covariate effect of $X_{2}$ does not appear to be affected by measurement error. Regardless, the attenuation in the estimated covariate effect for $X_{1}$ remains.

Refer to caption — Figure 1: Boxplots of regression coefficients estimates under (M1) with $X_{1}$ and $X_{2}$ dependent (left panel) and those under a revised version of (M1) with $X_{1}$ and $X_{2}$ independent (right panel). The two boxes associated with each parameter correspond to two estimators (from left to right): the MCCL estimator (red box) and the naive estimator (cyan box).

Table 2 presents the average of standard deviation estimation of each parameter in $\Omega$ based on (12) across 1000 Monte Carlo replicates from (M1) with $n=200$ . The Monte Carlo standard deviation of each parameter estimate in $\Omega$ is used as a reference/gold standard in this table. The proximity of the standard deviation estimate with the reference shown in the table suggests that the sandwich variance estimator in (12) provides reliable estimation for the variance of the MCCL estimator. This settles the second issue.

Table 2: Averages of standard deviation estimates,

\widehat{\text{s.d.}}

, and empirical standard deviation, s.d., across

1000

Monte Carlo replicates from (M1) with

\sigma^{2}_{u}=1.2

and

n=200

. Numbers in parentheses are Monte Carlo standard errors associated with the Monte Carlo means.

	$\beta_{0}$		$\beta_{1}$		$\beta_{2}$		$\log m$
	$\widehat{\text{s.d.}}$	s.d.	$\widehat{\text{ s.d. }}$	s.d.	$\widehat{\text{s.d.}}$	s.d.	$\widehat{\text{s.d.}}$	s.d.
MCCL	0.19 (0.03)	0.19	0.13 (0.03)	0.13	0.32 (0.06)	0.32	0.15 (0.02)	0.16
Naive	0.16 (0.02)	0.16	0.09 (0.01)	0.09	0.26 (0.03)	0.26	0.15 (0.01)	0.16

The third issue concerns the normality assumption on measurement error in the development of the Monte Carlo corrected score method. To assess the robustness of the MCCL estimator to this normality assumption, we revise (M1) by letting $U_{j,k}\sim\mbox{Laplace}(0,0.5^{1/2})$ instead, for $k=1,2,3$ , and set $n=200$ . Table 3 provides summary statistics of parameter estimates as those shown in Table 1 (with $B=100$ ) under this revised setting. In addition to estimates parallel to those considered in Table 1, we also include summary statistics for MCCL estimates obtained without using replicate measures of $X_{1}$ . That is, we keep $W_{j,1}$ in $\widetilde{W}_{j}=\{W_{j,k}\}_{k=1}^{3}$ as the only available error-contaminated measure of $X_{1,j}$ , for $j=1,\ldots,200$ , when constructing the corrected log likelihood function. In Section 6.2, we describe a modified version of the correct log likelihood in (10) that does not require replicate measures but depends on the measurement error variance (see (18)). This creates a scenario where the violation of normality assumption associated with the measurement error in $W_{j,1}$ is more severe than when $\overline{W}_{j}=\sum_{k=1}^{3}W_{j,k}/3$ is used as a surrogate of $X_{1,j}$ . As one can see from Table 3, despite the (severity in) violation of the normality assumption on $U$ , the MCCL estimates remain close to the truth and significantly outperform the naive estimates. This robustness feature of the Monte Carlo corrected score method is also noted and explained in Novick and Stefanski, (2002).

Table 3: Medians of MCCL estimates and medians of naive estimates across

1000

Monte Carlo replicates generated according to (M1) with

U_{j,k}\sim\mbox{Laplace}(0,0.5^{1/2})

and

n=200

. The number in parentheses following each median is the interquartile range of the 1000 realizations of an estimator.

\text{MCCL}_{1}

and

\text{MCCL}_{2}

refer to MCCL estimates when replicate measures are present and absent, respectively.

	$\beta_{0}$	$\beta_{1}$	$\beta_{2}$	$\log m$
$\text{MCCL}_{1}$	0.25 (0.26)	0.25 (0.17)	0.26 (0.41)	1.12 (0.19)
$\text{MCCL}_{2}$	0.25 (0.26)	0.25 (0.17)	0.26 (0.41)	1.12 (0.19)
Naive	0.18 (0.22)	0.19 (0.13)	0.39 (0.36)	1.10 (0.19)

5.3 Performance of the model diagnostic method

Using 5000 Monte Carlo replicates from (M1) with $\sigma_{u}^{2}=1.2$ at each sample size level in $n=100$ , 200, 500, 1000, we implement the bootstrap algorithm related in Section 4 with $M=300$ bootstrap samples to obtain estimated $p$ -values associated with the test statistic $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ . We then record the proportion of replicates, across 5000 replicates, that lead to rejection of the null hypothesis of no model misspecification at various nominal levels. This rejection rate can be viewed as an empirical size of the test at a pre-specified significance level. Figure 2 depicts this rejection rate versus the significance level, from which one can see that the size of the test is well controlled by the bootstrap procedure over a wide range of nominal levels.

Table 4 presents rejection rates of the model diagnostic method in the presence of different forms of model misspecification that occur when fitting data generated according to (M2)–(M4) while assuming a beta modal regression model specified in (M1). As one can see in Table 4, the proposed score-based test has moderate power to detect a misspecified form of the linear predictor, with the power steadily increasing as $n$ increases, and is especially powerful in detecting violation of the distributional assumption on $Y$ given covariates; but the test is less sensitive to link misspecification. Low power of most goodness-of-fit tests to detect link misspecification have been reported in the context of generalized linear models (e.g., Hosmer et al.,, 1997). Given these reported findings in the literature, the low power observed under design (M3) may not be surprising, especially with the high similarity of the logit link in the assumed model with the probit link in the true model in (M3).

Table 4: Rejection rates of the score-based diagnostic test resulting from 300 Monte Carlo replicates in the presence of four types of model misspecification in (M2)–(M4)

Model	$n=200$	$n=300$	$n=400$	$n=500$
(M2)	0.283	0.407	0.550	0.580
(M3)	0.053	0.120	0.090	0.113
(M4)	1.000	0.997	0.997	1.000

When the assumed beta modal regression model is rejected by the proposed diagnostic test, one may consider a more flexible unimodal distribution for the response conditioning on true covariates, such as the unimodal distributions formulated in Fernández and Steel, (1998), Quintana et al., (2009), Rubio and Steel, (2015), and Liu et al., 2022b . A different assumed primary regression model leads to a different log likelihood function $\ell(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{X}_{j})$ in (10), and our proposed strategy of correcting a naive log likelihood function to account for measurement error remains applicable for any parametric regression models.

6 Real-life data application

In this section, we analyze data arising from two different applications where a covariate of interest cannot be observed directly. Besides dealing with scientific questions in relevant fields, these applications provide opportunities for us to address some practical issues one faces when implementing the proposed estimation method and diagnostic method not discussed in the simulation study.

6.1 Application to dietary data

Food Frequency Questionnaire (FFQ) is a convenient and inexpensive dietary assessment instrument in epidemiologic studies. To study the association between an individual’s FFQ intake and his/her long-term usual intake as the univariate covariate $X$ , we analyze a dietary data set from Women’s Interview Survey of Health (Carroll et al.,, 1997). The data set contains 271 females’ FFQ intake records, measured as the percentage calories from fat, and six $24$ -hour food recalls, $W_{j,k}$ , for $j=1,\ldots,271$ and $k=1,\cdots,6$ . Because the $j$ -th subject’s long-term usual intake $X_{j}$ cannot be measured directly, a generally accepted practice in epidemiology is to use $\overline{W}_{j}=\sum_{k=1}^{6}W_{j,k}/6$ as a surrogate of $X_{j}$ , for $j=1,\ldots,271$ . According to the preliminary analysis in existing literature, the distribution of the FFQ intake appears to be right-skewed and potentially heavy-tailed, which motivates the consideration of a modal regression model in place of a mean regression model. Here, we assume a beta modal regression model given in (2) with $\theta(X)=1/\{1+\exp(-\beta_{0}-\beta_{1}X)\}$ for the response data $\{Y_{j}\}_{j=1}^{271}$ , where $Y_{j}$ is the $j$ -th subject’s FFQ intake in kilocalorie divided by 8000, a biologically plausible upper bound of daily energy intakes for a general population.

We obtain the MCCL estimate of $\mbox{\boldmath$\Omega$}=(\beta_{0},\beta_{1},\log m)^{\mathrm{% \scriptscriptstyle T}}$ according to (11), and also carry out regression analysis that ignores measurement error to obtain a naive maximum likelihood estimate of $\Omega$ . Moreover, we implemented the simulation-extrapolation method (SIMEX, Carroll et al.,, 2006, Chapter 5) applied to the assumed beta modal regression model. In this particular application, SIMEX amounts to repeatedly estimating $\Omega$ , without accounting for measurement error, using data $\mathcal{D}_{b}^{*}(\zeta)=\{(Y_{j},W_{j,b}(\zeta))\}_{j=1}^{n}$ , for $b=1,\ldots,B$ , where $W_{j,b}(\zeta)=\overline{W}_{j}+\sqrt{\zeta}\sigma_{u}Z_{j,b}$ , in which $\{Z_{j,b},j=1,\ldots,n\}_{b=1}^{B}$ are independent standard normal errors, $\sigma_{u}$ is the standard deviation of measurement error associated with the surrogate measure $\overline{W}_{j}$ , and $\zeta$ is a user-specified positive constant. Denote by $\hat{}\mbox{\boldmath$\Omega$}_{b}(\zeta)$ the (naive) estimator of $\Omega$ based on data $\mathcal{D}^{*}_{b}(\zeta)$ , then $\hat{}\mbox{\boldmath$\Omega$}(\zeta)=\sum_{b=1}^{B}\hat{}\mbox{\boldmath$% \Omega$}_{b}(\zeta)/B$ is a naive estimator based on data resulting from further contaminating the original error-prone data $\mathcal{D}^{*}=\{(Y_{j},\overline{W}_{j})\}_{j=1}^{n}$ , with the amount of additional contamination controlled by $\zeta$ . Collecting a sequence of $\hat{}\mbox{\boldmath$\Omega$}(\zeta)$ as one varies $\zeta$ realizes the simulation step of SIMEX. In this data application, we set $B=300$ and let $\zeta$ vary from 0.125 to 1 in increments of 0.125. The extrapolation step of SIMEX entails extrapolating the sequence of estimates in $\{\hat{}\mbox{\boldmath$\Omega$}(\zeta),\,\zeta=0.125,0.25,\ldots,1\}$ to $\hat{}\mbox{\boldmath$\Omega$}(-1)$ , leading to the so-called SIMEX estimator. A heuristic motivation of extrapolating towards $\zeta=-1$ can be revealed by noting that $\text{Var}(W_{j,b}(\zeta)|X_{j})=\text{Var}(\overline{W}_{j}|X_{j})+\zeta% \sigma_{u}^{2}$ , where $\sigma_{u}^{2}=\text{Var}(\overline{W}_{j}|X_{j})$ . Setting $\zeta=-1$ in the preceding variance expression gives $\text{Var}(W_{j,b}(-1)|X_{j})=0$ , as if $W_{j,b}(-1)$ contained no measurement error, and hence extrapolating $\{\hat{}\mbox{\boldmath$\Omega$}(\zeta),\text{ for $\zeta>0$}\}$ to obtain $\hat{}\mbox{\boldmath$\Omega$}(-1)$ is an attempt to “recover" an estimator of $\Omega$ had there been no covariate measurement error. Shi et al., (2021) applied SIMEX to a kernel-based modal regression model with error-prone covariates.

Three estimates of $\Omega$ , the MCCL estimate, SIMEX estimate, and naive estimate, are given in Table 5. The covariate effect associated with the long-term intake suggested by the naive estimate is substantially weaker than that indicated by the MCCL estimate and SIMEX estimate, implying potentially significant attenuation on the covariate effect due to measurement error in the former, whereas the latter two correct for this attenuation. Figure 3 depicts the estimated regression functions $\hat{\theta}(x)$ resulting from these three methods, imposed on the scaled response data versus the surrogate covariate data. This pictorial contrast between the three estimated regression functions shows that the proposed method and SIMEX are able to capture the underlying positive non-linear covariate effect that is partially concealed or weakened by the naive method. Although SIMEX produces similar inference results as those from our method, the simulation step relies on the error variance $\sigma^{2}_{u}$ when generating $W_{j,b}(\zeta)$ ’s, which we estimate in this example based on replicate measures; and the extrapolation step depends on the choice of an extrapolant, a choice that usually lacks data evidence to support in most applications. Here, we use a quadratic extrapolant to obtain the SIMEX estimate. Besides being more computationally burdensome compared to the MCCL method (due to repeatedly estimating $\Omega$ based further contaminated data), variance estimation for SIMEX estimators is also less straightforward than that for our estimator (Carroll et al.,, 1996). We resort to nonparametric bootstrap, with 1000 bootstrap samples, in this example to obtain the estimated standard errors associated with SIMEX estimates shown in Table 5. Finally, applying the proposed diagnostic method to this data set with $M=300$ bootstrap samples yields an estimated $p$ -value of 0.097. We thus conclude lack of sufficient data evidence (at significance level 0.05) to indicate the assumed beta modal regression model inadequate for this application.

Table 5: Estimates of parameters in the beta modal regression model applied to the dietary data, along with the corresponding estimated standard errors in parentheses

Method	$\beta_{0}$	$\beta_{1}$	$\log m$
MCCL	$-1.578$ (0.033)	0.381 (0.099)	3.015 (0.196)
SIMEX	$-1.580$ (0.034)	0.354 (0.087)	3.008 (0.195)
Naive	$-1.581$ (0.041)	0.270 (0.058)	2.979 (0.094)

6.2 Application to Alzheimer’s disease data

Medical researchers have long recognized that cerebral atrophy is associated with dementia, and extensive research have been conducted to understand the association between volumetric changes of different brain regions with the severity of dementia. Abundant data collected from this line of research are available in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/). Zhou and Huang, (2020) analyzed a data set relating to 245 individuals diagnosed with mild cognitive impairment from this database. The goal is to study roles that an individual’s volumetric measure of entorhinal cortex (ERC) and that of hippocampus (HPC) play in predicting one’s risk of develo** Alzheimer’s disease. An individual’s test score from the Alzheimer’s disease assessment scale, known as ADAS-11, at month 12 since entering the ADNI cohort is used to assess one’s severity of cognitive impairment. Covariates of interest are the volumetric change in ERC (ERC.change) and that in HPC (HPC.change) at month 12 compared to the baseline measures collected at month 6. Assuming these volumetric measures are observed precisely, Zhou and Huang, (2020) fitted the data to the beta modal regression model for the response $Y$ defined as an individual’s ADAS-11 score divided by a perfect score of 70, with the log-log link in the mode function, $\theta(\mathbf{X})=\exp\{-\exp(-\beta_{0}-\beta_{1}\times\text{ERC.change}-% \beta_{2}\times\text{HPC.change})\}$ , and showed that it provides a better fit for the data compared to the beta mean regression model proposed by Ferrari and Cribari-Neto, (2004).

In reality, measuring ERC volume is challenging because of lateral border discrimination from the perirhinal cortex (Price et al.,, 2010), and the accuracy of HPC measurements is also in question (Maclaren et al.,, 2014). It is thus more sensible to view the observed volumetric change of ERC or that of HPC as a noisy surrogate of the actual amount of change. Despite of which covariate is viewed as error-prone, the current data present some challenges due to the lack of replicate measures for an individual’s true covariate value, and thus the estimation methods proposed in Section 3 are not applicable. For example, in (10), the term multiplying the imaginary unit $i$ is equal to zero now with the number of replicates $n_{j}=1$ , making the “corrected" log-likelihood the same as the naive log-likelihood. A quick fix to the problem is to invoke a similar strategy of correcting naive scores to account for measurement error as discussed in Novick and Stefanski, (2002). Following this strategy, a corrected log-likelihood evaluated at the $j$ -th data point to use in place of (10) is

\tilde{\ell}(\mbox{\boldmath$\Omega$};Y_{j},\mathbf{W}_{j},\tilde{\mathbf{Z}}_% {j})=\frac{1}{B}\sum_{b=1}^{B}\text{Re}\{\ell(\mbox{\boldmath$\Omega$};Y_{j},% \mathbf{W}_{j}+i\mbox{\boldmath$\Sigma$}_{u}^{1/2}\mathbf{Z}_{j,b})\},

(18)

where $\tilde{\mathbf{Z}}_{j}=\{\mathbf{Z}_{j,b}\}_{b=1}^{B}$ , for $j=1,\ldots,n$ , and $\{\mathbf{Z}_{j,b},b=1,\ldots,B\}_{j=1}^{n}$ are independent $p$ -dimensional normal random vectors with mean zero and variance-covariance as an identity matrix, which accommodates multiple error-prone covariates in $\mathbf{X}$ by letting $\mathbf{W}_{j}$ be a $p$ -dimensional multivariate surrogate of $\mathbf{X}_{j}$ , contaminated by a multivariate normal measurement error $\mathbf{U}_{j}$ with variance-covariance matrix $\mbox{\boldmath$\Sigma$}_{u}$ . By setting all entries in $\mbox{\boldmath$\Sigma$}_{u}$ at zero except for the first diagonal entry gives rise to the case considered in the majority of this article with only $X_{1}$ prone to error. Certainly, not having replicate measures still creates an obstacle to implementing this strategy due to its dependence on $\mbox{\boldmath$\Sigma$}_{u}$ that cannot be estimated without replicate measures of a true multivariate covariate value or other external validation data. A well-accepted practice among statisticians in similar situations is to carry out sensitivity analysis where one analyzes the data under different assumptions for the parameter, such as $\mbox{\boldmath$\Sigma$}_{u}$ in our case, that one lacks data information to infer. If one obtains drastically different inference results when assuming different values for $\mbox{\boldmath$\Sigma$}_{u}$ , including a matrix of zeros corresponding to naive estimation that ignores measurement error, then one may recommend to exercise caution when interpreting results from an inference procedure that assumes error-free covariates.

For illustration purposes, we assume in the sensitivity analysis four values for $\mbox{\boldmath$\Sigma$}_{u}$ listed in Table 6, where inference results for model parameters under each assumed $\mbox{\boldmath$\Sigma$}_{u}$ are provided. According to Table 6, all four rounds of regression analyses lead to the conclusion that the volumetric change of ERC is an influential predictor for the severity of cognitive impairment, even though the magnitude of the estimated covariate effect is sensitive to the assumed error variance associated this covariate. In particular, when assuming imprecise measurements for ERC.change, the revised MCCL method that employs the corrected log-likelihood in (18) with $B=100000$ produces results indicating a much stronger association than the naive analysis. By comparison, the magnitude of the estimate for the HPC.change effect is less sensitive to the assumed $\mbox{\boldmath$\Sigma$}_{u}$ , but its statistical significance is noticeably affected by it. For example, one would conclude a moderately significant covariate effect of HPC.change based on the naive analysis assuming error-free covariates, but claim a highly significant, or moderately significant, or nonsignificant HPC.change effect depending on which covariate(s) one assumes to be error-prone and the severity of error contamination. This phenomenon is a reminiscence of an observation made in Figure 1, and may suggest that ERC.change and HPC.change are correlated. In fact, measurements of ERC and HPC via magnetic resonance imaging are known to be highly correlated with observed clinical alterations in patients suffering mild cognitive impairment or at dementia phases of Alzheimer’s disease (Desikan et al.,, 2010; Jack et al.,, 2013; Varon et al.,, 2014).

Table 6: Sensitivity analysis using the ADNI data for the beta modal regression with the log-log link. Numbers in parentheses are estimated standard errors. Numbers in square brackets are

p

-values associated with covariate effects.

$\mbox{\boldmath$\Sigma$}_{u}$	$\beta_{0}$	$\beta_{1}$ (ERC.change)	$\beta_{2}$ (HPC.change)	$\log m$
$\begin{bmatrix}0&0\\ 0&0\end{bmatrix}$	$-0.69$ (0.03)	$-0.12$ (0.05)	$-0.22$ (0.11)	2.78 (0.15)
$\begin{bmatrix}0&0\\ 0&0\end{bmatrix}$		[0.007]	[0.054]
$\begin{bmatrix}0.16&0\\ 0&0\end{bmatrix}$	$-0.88$ (0.03)	$-2.44$ (0.00)	$0.39$ (0.46)	3.45 (0.04)
$\begin{bmatrix}0.16&0\\ 0&0\end{bmatrix}$		[0.000]	[0.386]
$\begin{bmatrix}0&0\\ 0&0.0225\end{bmatrix}$	$-0.71$ (0.03)	$-0.11$ (0.05)	$-0.47$ (0.27)	2.80 (0.16)
$\begin{bmatrix}0&0\\ 0&0.0225\end{bmatrix}$		[0.014]	[0.084]
$\begin{bmatrix}0.16&0\\ 0&0.0225\end{bmatrix}$	$-0.81$ (0.03)	$-2.42$ (0.00)	$-0.85$ (0.00)	3.85 (0.02)
$\begin{bmatrix}0.16&0\\ 0&0.0225\end{bmatrix}$		[0.000]	[0.000]

In conclusion, results from the sensitivity analysis suggest that volumetric measures of different brain regions are likely to be subject to measurement error, and statistical analyses under the assumption of precisely measured covariates should be interpreted with caution. If replicate data are available for covariates of interest, the MCCL method can provide more reliable inference. Lastly, even though one can mimic (18) to construct a corrected score in place of $\tilde{\mathbf{S}}(\mbox{\boldmath$\Omega$};Y_{j},\widetilde{W}_{j},\widetilde% {T}_{j},\mathbf{X}_{-1,j})$ in (15) and then formulate the test statistic $\tilde{Q}(\hat{}\mbox{\boldmath$\Omega$};\mathcal{D}^{*})$ for model diagnostics, the dependence of the revised score on the unknown $\mbox{\boldmath$\Sigma$}_{u}$ remains an obstacle that hinders one from using the bootstrap procedure outlined in Section 4 to assess statistical significance of the revised test statistic. Alternative diagnostic methods that do not rely on parametric bootstrap or corrected score (e.g. Huang et al.,, 2006) can be used to detect inadequate assumptions imposed on the primary regression model.

7 Discussion

We propose an inference procedure based on the idea of corrected score that falls in the framework of $M$ -estimation for modal regression with an error-prone covariate. Even though in this article we focus on the beta modal regression model as the primary regression model, the proposed MCCL method is applicable in other parametric modal regression models, such as the gamma modal regression models for non-negative responses proposed by Aristodemou, (2014) and Bourguignon et al., (2020), and the flexible Gumbel regression model recently proposed by Liu et al., 2022b for responses ranging over the entire real line. In fact, provided that a parametric modal regression model can provide reliable inference for the global mode in the absence of covariate measurement error (even when $Y$ follows a multimodal distribution given $\mathbf{X}$ ), such as the flexible unimodal regression models considered in Liu et al., 2022a , the proposed MCCL method applied to error-prone data is expected to improve over the counterpart naive method that ignores measurement error. A Python package for implementing the proposed methods for beta modal regression with errors-in-covariate is available at https://pypi.org/project/pybetareg/. All computer programs used in this paper are available at https://github.com/rh8liuqy/Modal_regression_with_measurement_error.

To accommodate situations without replicate measures of the true covariate or settings with multiple error-prone covariates, the MCCL method can be easily revised as demonstrated in Section 6.2, although one needs to specify the variance (or the variance-covariance matrix) of the (vector-valued) measurement error if one lacks replicate data or external validation data to estimate it.

Focusing on the current beta modal regression models, some extensions are worthy of further investigation, such as a zero-inflated beta modal regression model to fit disease prevalence data especially suitable for rare diseases, and a four-parameter beta modal regression model as considered in Zhou and Huang, (2020) for a bounded response with unknown support. Another follow-up research direction is variable selection based on a parametric modal regression model with or without measurement error contamination in covariates.

Conflict of Interest

The authors have declared no conflict of interest.

References

Aristodemou, (2014) Aristodemou, K. (2014). New regression methods for measures of central tendency. PhD thesis, Brunel University.
Bagnato and Punzo, (2013) Bagnato, L. and Punzo, A. (2013). Finite mixtures of unimodal beta and gamma densities and the $k$ -bumps algorithm. Computational Statistics, 28(4):1571–1597.
Boos and Stefanski, (2013) Boos, D. D. and Stefanski, L. A. (2013). Essential Statistical Inference: Theory and Methods, volume 591. Springer.
Bourguignon et al., (2020) Bourguignon, M., Leão, J., and Gallardo, D. I. (2020). Parametric modal regression with varying precision. Biometrical Journal, 62(1):202–220.
Buonaccorsi et al., (2016) Buonaccorsi, J., Prochenka, A., Thoresen, M., and Ploski, R. (2016). Correcting for binomial measurement error in predictors in regression with application to analysis of DNA methylation rates by bisulfite sequencing. Statistics in medicine, 35(22):3987–4007.
Buonaccorsi, (2010) Buonaccorsi, J. P. (2010). Measurement Error: Models, Methods, and Applications. Chapman and Hall/CRC.
Buonaccorsi et al., (2018) Buonaccorsi, J. P., Romeo, G., and Thoresen, M. (2018). Model-based bootstrap** when correcting for measurement error with application to logistic regression. Biometrics, 74(1):135–144.
Carroll et al., (1997) Carroll, R. J., Freedman, L., and Pee, D. (1997). Design aspects of calibration studies in nutrition, with analysis of missing data in linear measurement error models. Biometrics, 53(4).
Carroll et al., (1996) Carroll, R. J., Küchenhoff, H., Lombard, F., and Stefanski, L. A. (1996). Asymptotics for the SIMEX estimator in nonlinear measurement error models. Journal of the American Statistical Association, 91(433):242–250.
Carroll et al., (2006) Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models. Chapman and Hall/CRC.
Chacón, (2020) Chacón, J. E. (2020). The modal age of statistics. International Statistical Review, 88(1):122–141.
Chen, (1999) Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics & Data Analysis, 31(2):131–145.
Chen et al., (2016) Chen, Y.-C., Genovese, C. R., Tibshirani, R. J., and Wasserman, L. (2016). Nonparametric modal regression. The Annals of Statistics, 44(2):489–514.
Davison and Hinkley, (1997) Davison, A. C. and Hinkley, D. V. (1997). Bootstrap methods and their application. Number 1. Cambridge university press.
Desikan et al., (2010) Desikan, R. S., Cabral, H. J., Settecase, F., Hess, C. P., Dillon, W. P., Glastonbury, C. M., Weiner, M. W., Schmansky, N. J., Salat, D. H., and Fischl, B. (2010). Automated MRI measures predict progression to Alzheimer’s disease. Neurobiology of Aging, 31(8).
Fernández and Steel, (1998) Fernández, C. and Steel, M. F. (1998). On bayesian modeling of fat tails and skewness. Journal of the american statistical association, 93(441):359–371.
Ferrari and Cribari-Neto, (2004) Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7):799–815.
Fuller, (2009) Fuller, W. A. (2009). Measurement Error Models. John Wiley & Sons.
Hall and Wilson, (1991) Hall, P. and Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, pages 757–762.
He and Liang, (2000) He, X. and Liang, H. (2000). Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Statistica Sinica, 10(1):129–140.
Hosmer et al., (1997) Hosmer, D. W., Hosmer, T., Le Cessie, S., and Lemeshow, S. (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in medicine, 16(9):965–980.
Huang et al., (2006) Huang, X., Stefanski, L. A., and Davidian, M. (2006). Latent-model robustness in structural measurement error models. Biometrika, 93(1):53–64.
Jack et al., (2013) Jack, C. R., Knopman, D. S., Jagust, W. J., Petersen, R. C., Weiner, M. W., Aisen, P. S., Shaw, L. M., Vemuri, P., Wiste, H. J., Weigand, S. D., Lesnick, T. G., Pankratz, V. S., Donohue, M. C., and Trojanowski, J. Q. (2013). Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. The Lancet Neurology, 12(2):207–216.
Kemp et al., (2020) Kemp, G. C., Parente, P. M., and Santos Silva, J. (2020). Dynamic vector mode regression. Journal of Business & Economic Statistics, 38(3):647–661.
Kruschke, (2015) Kruschke, J. K. (2015). Doing Bayesian Data Analysis: A tutorial with R, JAGS, and Stan. Academic Press.
Lee, (1989) Lee, M.-J. (1989). Mode regression. Journal of Econometrics, 42(3):337–349.
Lee, (1993) Lee, M.-J. (1993). Quadratic mode regression. Journal of Econometrics, 57(1-3):1–19.
Li and Huang, (2019) Li, X. and Huang, X. (2019). Linear mode regression with covariate measurement error. Canadian Journal of Statistics, 47(2).
(29) Liu, Q., Huang, X., and Bai, R. (2022a). Bayesian modal regression based on mixture distributions. arXiv preprint arXiv:2211.10776.
(30) Liu, Q., Huang, X., and Zhou, H. (2022b). The flexible gumbel distribution: A new model for inference about the mode. arXiv preprint arXiv:2212.01832.
Maclaren et al., (2014) Maclaren, J., Han, Z., Vos, S. B., Fischbein, N., and Bammer, R. (2014). Reliability of brain volume measurements: A test-retest dataset. Scientific Data, 1(1).
Martin, (2007) Martin, M. A. (2007). Bootstrap hypothesis testing for some common statistical problems: A critical evaluation of size and power properties. Computational Statistics & Data Analysis, 51(12):6321–6342.
Nakamura, (1990) Nakamura, T. (1990). Corrected score function for errors-in-variables models: Methodology and application to generalized linear models. Biometrika, 77(1).
Novick and Stefanski, (2002) Novick, S. J. and Stefanski, L. A. (2002). Corrected score estimation via complex variable simulation extrapolation. Journal of the American Statistical Association, 97(458):472–481.
Ota et al., (2019) Ota, H., Kato, K., and Hara, S. (2019). Quantile regression approach to conditional mode estimation. Electronic Journal of Statistics, 13(2):3120–3160.
Price et al., (2010) Price, C., Wood, M., Leonard, C., Towler, S., Ward, J., Montijo, H., Kellison, I., Bowers, D., Monk, T., Newcomer, J., and et al. (2010). Entorhinal cortex volume in older adults: Reliability and validity considerations for three published measurement protocols. Journal of the International Neuropsychological Society, 16:846–855.
Quintana et al., (2009) Quintana, F. A., Steel, M. F., and Ferreira, J. T. (2009). Flexible univariate continuous distributions. Bayesian Analysis, 4(4):497–522.
Rubio and Steel, (2015) Rubio, F. and Steel, M. (2015). Bayesian modelling of skewness and kurtosis with two-piece scale and shape distributions. Electronic Journal of Statistics, 9:1884–1912.
Sager and Thisted, (1982) Sager, T. W. and Thisted, R. A. (1982). Maximum likelihood estimation of isotonic modal regression. The Annals of Statistics, 10(3):690–707.
Shi et al., (2021) Shi, J., Zhang, Y., Yu, P., and Song, W. (2021). SIMEX estimation in parametric modal regression with measurement error. Computational Statistics & Data Analysis, 157:107158.
Stefanski, (1989) Stefanski, L. A. (1989). Unbiased estimation of a nonlinear function a normal mean with application to measurement error models. Communications in Statistics-Theory and Methods, 18(12):4335–4358.
Stefanski et al., (2005) Stefanski, L. A., Novick, S. J., and Devanarayan, V. (2005). Estimating a nonlinear function of a normal mean. Biometrika, 92(3):732–736.
Thomas et al., (2011) Thomas, L., Stefanski, L., and Davidian, M. (2011). A moment-adjusted imputation method for measurement error models. Biometrics, 67(4):1461–1470.
Ullah et al., (2022) Ullah, A., Wang, T., and Yao, W. (2022). Nonlinear modal regression for dependent data with application for predicting COVID-19. Journal of the Royal Statistical Society. Series A,(Statistics in Society), 185(3):1424–1453.
Varon et al., (2014) Varon, D., Barker, W., Loewenstein, D., Greig, M., Bohorquez, A., Santos, I., Shen, Q., Harper, M., Vallejo-Luces, T., and and, R. D. (2014). Visual rating and volumetric measurement of medial temporal atrophy in the Alzheimer’s disease neuroimaging initiative (ADNI) cohort: baseline diagnosis and the prediction of MCI outcome. International Journal of Geriatric Psychiatry, 30(2):192–200.
Wang et al., (2012) Wang, H. J., Stefanski, L. A., and Zhu, Z. (2012). Corrected-loss estimation for quantile regression with covariate measurement errors. Biometrika, 99(2):405.
Wang et al., (2019) Wang, K., Li, S., Sun, X., and Lin, L. (2019). Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection. Computational Statistics & Data Analysis, 133:257–276.
Wei and Carroll, (2009) Wei, Y. and Carroll, R. J. (2009). Quantile regression with measurement error. Journal of the American Statistical Association, 104(487):1129–1143.
White, (1982) White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica: Journal of the econometric society, pages 1–25.
Xiang and Yao, (2022) Xiang, S. and Yao, W. (2022). Nonparametric statistical learning based on modal regression. Journal of Computational and Applied Mathematics, 409:114130.
Yao and Li, (2013) Yao, W. and Li, L. (2013). A new regression model: Modal linear regression. Scandinavian Journal of Statistics, 41(3):656–671.
Yi, (2017) Yi, G. Y. (2017). Statistical Analysis with Measurement Error or Misclassification. Springer New York.
Zhang et al., (2021) Zhang, T., Kato, K., and Ruppert, D. (2021). Bootstrap inference for quantile-based modal regression. Journal of the American Statistical Association, pages 1–13.
Zhou and Huang, (2016) Zhou, H. and Huang, X. (2016). Nonparametric modal regression in the presence of measurement error. Electronic Journal of Statistics, 10(2).
Zhou and Huang, (2020) Zhou, H. and Huang, X. (2020). Parametric mode regression for bounded responses. Biometrical Journal, 62(7):1791–1809.
Zhou and Huang, (2022) Zhou, H. and Huang, X. (2022). Bayesian beta regression for bounded responses with unknown supports. Computational Statistics & Data Analysis, 167:107345.