Search | arXiv e-print repository

A Linear Errors-in-Variables Model with Unknown Heteroscedastic Measurement Errors

Authors: Linh H. Nghiem, Cornelis J. Potgieter

Abstract: In the classic measurement error framework, covariates are contaminated by independent additive noise. This paper considers parameter estimation in such a linear errors-in-variables model where the unknown measurement error distribution is heteroscedastic across observations. We propose a new generalized method of moment (GMM) estimator that combines a moment correction approach and a phase functi… ▽ More In the classic measurement error framework, covariates are contaminated by independent additive noise. This paper considers parameter estimation in such a linear errors-in-variables model where the unknown measurement error distribution is heteroscedastic across observations. We propose a new generalized method of moment (GMM) estimator that combines a moment correction approach and a phase function-based approach. The former requires distributions to have four finite moments, while the latter relies on covariates having asymmetric distributions. The new estimator is shown to be consistent and asymptotically normal under appropriate regularity conditions. The asymptotic covariance of the estimator is derived, and the estimated standard error is computed using a fast bootstrap procedure. The GMM estimator is demonstrated to have strong finite sample performance in numerical studies, especially when the measurement errors follow non-Gaussian distributions. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2109.14010 [pdf, other]

Penalized Likelihood Methods for Modeling Count Data

Authors: Minh Thu Bui, Cornelis J. Potgieter, Akihito Kamata

Abstract: The paper considers parameter estimation in count data models using penalized likelihood methods. The motivating data consists of multiple independent count variables with a moderate sample size per variable. The data were collected during the assessment of oral reading fluency (ORF) in school-aged children. A sample of fourth-grade students were given one of ten available passages to read with th… ▽ More The paper considers parameter estimation in count data models using penalized likelihood methods. The motivating data consists of multiple independent count variables with a moderate sample size per variable. The data were collected during the assessment of oral reading fluency (ORF) in school-aged children. A sample of fourth-grade students were given one of ten available passages to read with these differing in length and difficulty. The observed number of words read incorrectly (WRI) is used to measure ORF. Three models are considered for WRI scores, namely the binomial, the zero-inflated binomial, and the beta-binomial. We aim to efficiently estimate passage difficulty, a quantity expressed as a function of the underlying model parameters. Two types of penalty functions are considered for penalized likelihood with respective goals of shrinking parameter estimates closer to zero or closer to one another. A simulation study evaluates the efficacy of the shrinkage estimates using Mean Square Error (MSE) as metric. Big reductions in MSE relative to unpenalized maximum likelihood are observed. The paper concludes with an analysis of the motivating ORF data. △ Less

Submitted 12 May, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

arXiv:1812.00492 [pdf, other]

Estimation in linear errors-in-variables models with unknown error distribution

Authors: Linh Nghiem, Michael Byrd, Cornelis Potgieter

Abstract: Parameter estimation in linear errors-in-variables models typically requires that the measurement error distribution be known (or estimable from replicate data). A generalized method of moments approach can be used to estimate model parameters in the absence of knowledge of the error distributions, but requires the existence of a large number of model moments. In this paper, parameter estimation b… ▽ More Parameter estimation in linear errors-in-variables models typically requires that the measurement error distribution be known (or estimable from replicate data). A generalized method of moments approach can be used to estimate model parameters in the absence of knowledge of the error distributions, but requires the existence of a large number of model moments. In this paper, parameter estimation based on the phase function, a normalized version of the characteristic function, is considered. This approach requires the model covariates to have asymmetric distributions, while the error distributions are symmetric. Parameter estimation is then based on minimizing a distance function between the empirical phase functions of the noisy covariates and the outcome variable. No knowledge of the measurement error distribution is required to calculate this estimator. Both the asymptotic and finite sample properties of the estimator are considered. The connection between the phase function approach and method of moments is also discussed. The estimation of standard errors is also considered and a modified bootstrap algorithm is proposed for fast computation. The newly proposed estimator is competitive when compared to generalized method of moments, even while making fewer model assumptions on the measurement error. Finally, the proposed method is applied to a real dataset concerning the measurement of air pollution. △ Less

Submitted 2 December, 2018; originally announced December 2018.

arXiv:1808.10477 [pdf, other]

Simulation-Selection-Extrapolation: Estimation in High-Dimensional Errors-in-Variables Models

Authors: Linh Nghiem, Cornelis Potgieter

Abstract: This paper considers errors-in-variables models in a high-dimensional setting where the number of covariates can be much larger than the sample size, and there are only a small number of non-zero covariates. The presence of measurement error in the covariates can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true s… ▽ More This paper considers errors-in-variables models in a high-dimensional setting where the number of covariates can be much larger than the sample size, and there are only a small number of non-zero covariates. The presence of measurement error in the covariates can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMSELEX (SIMulation-SELection-EXtrapolation) is proposed. This procedure augments the traditional SIMEX approach with a variable selection step based on the group lasso. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors-in-variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown how SIMSELEX can be applied to spline-based regression models. SIMSELEX estimators are compared to the corrected lasso and the conic programming estimator for a linear model, and to the conditional scores lasso for a logistic regression model. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors. △ Less

Submitted 30 August, 2018; originally announced August 2018.

arXiv:1706.01507 [pdf, other]

Density Deconvolution for Generalized Skew-Symmetric Distributions

Authors: Cornelis J. Potgieter

Abstract: This paper develops a density deconvolution estimator that assumes the density of interest is a member of the generalized skew-symmetric (GSS) family of distributions. Estimation occurs in two parts: a skewing function, as well as location and scale parameters must be estimated. A kernel method is proposed for estimating the skewing function. The mean integrated square error (MISE) of the resultin… ▽ More This paper develops a density deconvolution estimator that assumes the density of interest is a member of the generalized skew-symmetric (GSS) family of distributions. Estimation occurs in two parts: a skewing function, as well as location and scale parameters must be estimated. A kernel method is proposed for estimating the skewing function. The mean integrated square error (MISE) of the resulting GSS deconvolution estimator is derived. Based on derivation of the MISE, two bandwidth estimation methods for estimating the skewing function are also proposed. A generalized method of moments (GMM) approach is developed for estimation of the location and scale parameters. The question of multiple solutions in applying the GMM is also considered, and two solution selection criteria are proposed. The GSS deconvolution estimator is further investigated in simulation studies and is compared to the nonparametric deconvolution estimator. For most simulation settings considered, the GSS estimator has performance superior to the nonparametric estimator. △ Less

Submitted 5 June, 2017; originally announced June 2017.

arXiv:1706.00062 [pdf, other]

A Latent Trait Model for Multivariate Longitudinal Data With Two Sources of Measurement Error

Authors: Amy E. Nussbaum, Cornelis J. Potgieter, Michael Chmielewski

Abstract: Personality traits are latent variables, and as such, are impossible to measure without the use of an assessment. Responses on the assessments can be influenced by both transient (state-related) error and measurement error, obscuring the true trait levels. Typically, these assessments utilize Likert scales, which yield only discrete data. The loss of information due to the discrete nature of the d… ▽ More Personality traits are latent variables, and as such, are impossible to measure without the use of an assessment. Responses on the assessments can be influenced by both transient (state-related) error and measurement error, obscuring the true trait levels. Typically, these assessments utilize Likert scales, which yield only discrete data. The loss of information due to the discrete nature of the data represents an additional challenge in assessing the ability of these instruments to measure the latent trait of interest. This paper is concerned with parameter estimation in a model relating a latent variable, as well transient error and measurement error components when data are longitudinal and measured using a Likert scale. Two methods for parameter estimation are detailed: correlation reconstruction, a method that uses polychoric correlations, and maximum likelihood implemented using a Stochastic EM algorithm. These methods are applied to a motivating dataset of 440 college students taking the Big Five inventory twice in a two month period. △ Less

Submitted 31 May, 2017; originally announced June 2017.

arXiv:1705.10446 [pdf, ps, other]

An EM Algorithm for Estimating an Oral Reading Speed and Accuracy Model

Authors: Cornelis J. Potgieter, Akihito Kamata, Yusuf Kara

Abstract: This study proposes a two-part model that includes components for reading accuracy and reading speed. The speed component is a log-normal factor model, for which speed data are measured by reading time for each sentence being assessed. The accuracy component is a binomial-count factor model, where the accuracy data are measured by the number of correctly read words in each sentence. Both underlyin… ▽ More This study proposes a two-part model that includes components for reading accuracy and reading speed. The speed component is a log-normal factor model, for which speed data are measured by reading time for each sentence being assessed. The accuracy component is a binomial-count factor model, where the accuracy data are measured by the number of correctly read words in each sentence. Both underlying latent components are assumed to be Gaussian in nature. In this paper, the theoretical properties of the proposed model are developed and an Monte Carlo EM algorithm for model fitting is outlined. The predictive power of the model is illustrated in a real data application. △ Less

Submitted 29 May, 2017; originally announced May 2017.

arXiv:1705.09846 [pdf, other]

Phase Function Density Deconvolution with Heteroscedastic Measurement Error of Unknown Type

Authors: Linh Nghiem, Cornelis J. Potgieter

Abstract: It is important to properly correct for measurement error when estimating density functions associated with biomedical variables. These estimators that adjust for measurement error are broadly referred to as density deconvolution estimators. While most methods in the literature assume the distribution of the measurement error to be fully known, a recently proposed method based on the empirical pha… ▽ More It is important to properly correct for measurement error when estimating density functions associated with biomedical variables. These estimators that adjust for measurement error are broadly referred to as density deconvolution estimators. While most methods in the literature assume the distribution of the measurement error to be fully known, a recently proposed method based on the empirical phase function (EPF) can deal with the situation when the measurement error distribution is unknown. The EPF density estimator has only been considered in the context of additive and homoscedastic measurement error; however, the measurement error of many biomedical variables is heteroscedastic in nature. In this paper, we developed a phase function approach for density deconvolution when the measurement error has unknown distribution and is heteroscedastic. A weighted empirical phase function (WEPF) is proposed where the weights are used to adjust for heteroscedasticity of measurement error. The asymptotic properties of the WEPF estimator are evaluated. Simulation results show that the weighting can result in large decreases in mean integrated squared error (MISE) when estimating the phase function. The estimation of the weights from replicate observations is also discussed. Finally, the construction of a deconvolution density estimator using the WEPF is compared to an existing deconvolution estimator that adjusts for heteroscedasticity, but assumes the measurement error distribution to be fully known. The WEPF estimator proves to be competitive, especially when considering that it relies on the minimal assumption of the distribution of measurement error. △ Less

Submitted 4 June, 2018; v1 submitted 27 May, 2017; originally announced May 2017.

arXiv:1705.09840 [pdf, other]

A Split-Sample Approach for Estimating the Stability Index of a Stable Distribution

Authors: Sudharshan Samaratunga, Cornelis J. Potgieter

Abstract: The class of stable distributions is used in practice to model data that exhibit heavy tails and/or skewness. The stability index $α$ of a stable distribution is a measure of tail heaviness and is often of primary interest. Existing methods for estimating the index parameter include maximum likelihood and methods based on the sample quantiles. In this paper, a new approach for estimating the index… ▽ More The class of stable distributions is used in practice to model data that exhibit heavy tails and/or skewness. The stability index $α$ of a stable distribution is a measure of tail heaviness and is often of primary interest. Existing methods for estimating the index parameter include maximum likelihood and methods based on the sample quantiles. In this paper, a new approach for estimating the index parameter of a stable distribution is proposed. This new approach relies on the location-scale family representation of the class of stable distributions and involves repeatedly partitioning the single observed sample into two independent samples. An asymptotic likelihood method based on sample order statistics, previously used for estimating location and scale parameters in two independent samples, is adapted for estimating the stability index. The properties of the proposed method of estimation are explored and the resulting estimators are evaluated using a simulation study. △ Less

Submitted 27 May, 2017; originally announced May 2017.

arXiv:1705.00351 [pdf, ps, other]

Nonparametric Cusum Charts for Angular Data with Applications in Health Science and Astrophysics

Authors: F. Lombard, Douglas M. Hawkins, Cornelis Potgieter

Abstract: This paper develops non-parametric rotation invariant CUSUMs suited to the detection of changes in the mean direction as well as changes in the concentration parameter of angular data. The properties of the CUSUMs are illustrated by theoretical calculations, Monte Carlo simulation and application to sequentially observed angular data from health science and astrophysics. This paper develops non-parametric rotation invariant CUSUMs suited to the detection of changes in the mean direction as well as changes in the concentration parameter of angular data. The properties of the CUSUMs are illustrated by theoretical calculations, Monte Carlo simulation and application to sequentially observed angular data from health science and astrophysics. △ Less

Submitted 7 June, 2018; v1 submitted 30 April, 2017; originally announced May 2017.

Showing 1–10 of 10 results for author: Potgieter, C