\authorsnames

[1,2,3]Yang Liu, Jolynn Pek, & Alberto Maydeu-Olivares \authorsaffiliationsDepartment of Human Development and Quantitative Methodology
University of Maryland, College Park, Department of Psychology, The Ohio State University, Department of Psychology, University of South Carolina and
Faculty of Psychology, University of Barcelona \authornoteCorrespondence should be made to Yang Liu at 3304R Benjamin Bldg, 3942 Campus Dr, University of Maryland, College Park, MD 20742. Email: [email protected].

On a General Theoretical Framework of Reliability

Abstract

Reliability is an essential measure of how closely observed scores represent latent scores (reflecting constructs), assuming some latent variable measurement model. We present a general theoretical framework of reliability, placing emphasis on measuring association between latent and observed scores. This framework was inspired by McDonald’s \APACyear2011 regression framework, which highlighted the coefficient of determination as a measure of reliability. We extend McDonald’s \APACyear2011 framework beyond coefficients of determination and introduce four desiderata for reliability measures (estimability, normalization, symmetry, and invariance). We also present theoretical examples to illustrate distinct measures of reliability and report on a numerical study that demonstrates the behavior of different reliability measures. We conclude with a discussion on the use of reliability coefficients and outline future avenues of research.

keywords:
reliability, latent variable modeling, classical test theory, prediction, measure of association

Psychological theories are often developed and assessed using the notion of constructs. Constructs (e.g., attitudes, personality, psychopathy) cannot be directly observed and are often defined operationally as latent variables (LVs; \citeNPhoyle.borsboom&tay.2024; see also \citeNPdeboeck.et.al.2023a for a recent discussion on the notion of constructs). We use the term latent scores to refer to LVs and transformations of LVs. Often, LVs are assumed to be reflected by scores on manifest variables (MVs; e.g., item responses). Often, observed scores that are functions of MVs are estimated to serve as proxies of latent scores to make inferences about constructs. In the developments to follow, we assume that constructs are validly operationalized in the population by an LV measurement model (e.g., item response theory [IRT] model [\citeNPthissen&steinberg.2009]), which formally expresses the link between MVs and LVs.

Observed scores (e.g., estimated factor scores and summed scores) are often employed for scoring, classification, and examining relations among constructs (e.g., see \citeNPliu&pek.inpress). When employing observed scores in research, it is pertinent to consider the extent to which observed scores map well onto latent scores that operationalize psychological constructs. An imperfect map** manifests as measurement error and might result in misleading inference (\citeNPbollen.1989, Chapter 5; \citeNPcole&preacher.2014). Thus, it is important to assess how well observed scores align with latent scores, which is quantified by reliability coefficients.

Many popular reliability coefficients can be interpreted as coefficients of determination based on regression models (\citeNPmcdonald.2011; see \citeNPliu.pek&maydeu-olivares.2024 for a review). For example, classical test theory (CTT) reliability is the coefficient of determination associated with regressing an observed score onto all LVs (in the measurement model), which is referred to as a measurement decomposition of the observed score. CTT reliability quantifies how well these LVs account for variance of the observed score (e.g., \citeNPanastasi&urbina.1997; \citeNPdevellis&thorpe.2021; \citeNPraykov&marcoulides.2011). Conversely, proportional reduction in mean square error (PRMSE; \citeNPhaberman&sinharay.2010) is the coefficient of determination associated with regressing a latent score onto all MVs (in the measurement model), which is referred to as a prediction decomposition of the latent score. PRMSE is a popular measure of reliability in the IRT literature and quantifies the proportion of latent score variance accounted for by MVs.

The purpose of this paper is to extend the existing regression framework of reliability McDonald (\APACyear2011), from which we derive novel reliability coefficients that also quantify the alignment between latent and observed scores. We frame reliability coefficients within the broader context of association measures, which include the coefficient of determination from the special case of the univariate regression framework McDonald (\APACyear2011). To organize new reliability coefficients under the extended framework, we introduce four desiderata, discuss several example reliability coefficients, and illustrate their behavior with a numerical study.

The paper is organized as follows. We begin by introducing notation and preliminary concepts. Next, we briefly review the regression framework of reliability McDonald (\APACyear2011), focusing on the measurement and prediction decompositions that result in CTT reliability and PRMSE, respectively. We then consider reliability coefficients as measures of association between latent and observed scores, expanding the regression framework. To organize reliability coefficients under this generalized framework, we introduce four desiderata. The first two (estimability and normalization) are necessary whereas the next two (symmetry and invariance) are not essential. We then present five theoretical examples to illustrate the generality of the proposed framework: (a) squared Pearson’s correlation Kim (\APACyear2012), (b) coefficient sigma Schweizer \BBA Wolff (\APACyear1981), (c) mutual information Joe (\APACyear1989); Markon (\APACyear2023), (d) coefficient T𝑇Titalic_T Azadkia \BBA Chatterjee (\APACyear2021), and (e) generalized coefficients of determination for multivariate regression (e.g., \citeNPpillai.1955; \citeNPwilks.1932). Next, we report on a numerical study investigating the performance of these reliability coefficients under a two-dimensional independent-cluster IRT model. Finally, we end with a discussion on limitations and future avenues of research.

Reliability from a Regression Framework

Notation and Assumptions

Let 𝐲isubscript𝐲𝑖\mathbf{y}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be an m×1𝑚1m\times 1italic_m × 1 vector of MVs for person i𝑖iitalic_i, in which i=1,,n𝑖1𝑛i=1,\dots,nitalic_i = 1 , … , italic_n. The MVs are assumed to reflect LVs for person i𝑖iitalic_i as represented by the d×1𝑑1d\times 1italic_d × 1 vector 𝜼isubscript𝜼𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We also assume a correctly specified measurement model that formally links the MVs to the LVs, resulting in a joint probability density function (pdf) between 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝜼¯isubscript¯𝜼𝑖\underline{\boldsymbol{\eta}}_{i}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, denoted by f(𝐲¯i,𝜼¯i)𝑓subscript¯𝐲𝑖subscript¯𝜼𝑖f(\underline{\mathbf{y}}_{i},\underline{\boldsymbol{\eta}}_{i})italic_f ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).111In the most general scenario, 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝜼¯isubscript¯𝜼𝑖\underline{\boldsymbol{\eta}}_{i}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT may combine continuous and discrete random variables. Therefore, the pdf should be understood as the Radon-Nikodym derivative with respect to a product measure that is composed of Lebesgue measures for continuous variates and counting measures for discrete variates. We underline an object (variable or vector) to indicate that it is random. Furthermore, let 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denote a m×1superscript𝑚1m^{*}\times 1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × 1 vector of observed scores (e.g., estimated factor scores and summed scores) and 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denote a d×1superscript𝑑1d^{*}\times 1italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × 1 vector of latent scores (e.g., LVs and CTT true scores). Here, mmsuperscript𝑚𝑚m^{*}\leq mitalic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_m and ddsuperscript𝑑𝑑d^{*}\leq ditalic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_d. While parameters to a measurement model are estimated from data in practice, we limit our discussion to focus on reliability measures in the population.

Reliability Coefficients Based on Regressions

Inspired by \citeAmcdonald.2011, \citeAliu.pek&maydeu-olivares.2024 interpreted reliability coefficients as coefficients of determination based on univariate regressions. The measurement decomposition of reliability regresses a univariate observed score s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (i.e., m=1superscript𝑚1m^{*}=1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1) onto all the LVs in 𝜼¯isubscript¯𝜼𝑖\underline{\boldsymbol{\eta}}_{i}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Conversely, the prediction decomposition of reliability regresses a univariate latent score ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (i.e., d=1superscript𝑑1d^{*}=1italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1) onto all the MVs in 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. As described below, the measurement decomposition yields CTT reliability and the prediction decomposition of reliability yields PRMSE.

The measurement decomposition, defined for a scalar-valued observed score s(𝐲i)𝑠subscript𝐲𝑖s(\mathbf{y}_{i})italic_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and also known as the true score formula <e.g.,>[Section 5.2]raykov&marcoulides.2011, can be expressed as

s(𝐲i)=𝔼[s(𝐲¯i)|𝜼i]+εi.𝑠subscript𝐲𝑖𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript𝜼𝑖subscript𝜀𝑖s(\mathbf{y}_{i})=\mathbb{E}\big{[}s(\underline{\mathbf{y}}_{i})|\boldsymbol{% \eta}_{i}\big{]}+\varepsilon_{i}.italic_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (1)

Because a regression traces the conditional expectation of an outcome variable given explanatory variables (e.g., \citeNPfox.2015, p. 15), Equation 1 can be considered a (potentially nonlinear) regression of the observed score s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) onto the LVs in 𝜼¯isubscript¯𝜼𝑖\underline{\boldsymbol{\eta}}_{i}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The conditional expectation of s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) given 𝜼isubscript𝜼𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is popularly referred to as the true score underlying s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and the error term ε¯isubscript¯𝜀𝑖\underline{\varepsilon}_{i}under¯ start_ARG italic_ε end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT has mean 0 and is uncorrelated with the true score (Lord \BBA Novick, \APACyear1968, Theorem 2.7.1). Alternatively, Equation 1 can be viewed as a unit-weight linear regression (i.e., with intercept 0 and slope 1) of the observed score s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) onto its true score 𝔼[s(𝐲¯i)|𝜼¯i]𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖\mathbb{E}\big{[}s(\underline{\mathbf{y}}_{i})|\underline{\boldsymbol{\eta}}_{% i}\big{]}blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. The corresponding coefficient of determination in Equation 1 quantifies the proportion of observed score variance that is explained by latent (true) score variance; this coefficient of determination is identical to CTT reliability:

ϱ2(s(𝐲¯i),𝜼¯i)=ϱ2(s(𝐲¯i),𝔼[s(𝐲¯i)|𝜼¯i])=Var(𝔼[s(𝐲¯i)|𝜼¯i])Var[s(𝐲¯i)]=1𝔼(Var[s(𝐲¯i)|𝜼¯i])Var[s(𝐲¯i)].superscriptitalic-ϱ2𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖superscriptitalic-ϱ2𝑠subscript¯𝐲𝑖𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖Var𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖Vardelimited-[]𝑠subscript¯𝐲𝑖1𝔼Vardelimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖Vardelimited-[]𝑠subscript¯𝐲𝑖\varrho^{2}(s(\underline{\mathbf{y}}_{i}),\underline{\boldsymbol{\eta}}_{i})=% \varrho^{2}\big{(}s(\underline{\mathbf{y}}_{i}),\mathbb{E}\big{[}s(\underline{% \mathbf{y}}_{i})|\underline{\boldsymbol{\eta}}_{i}\big{]}\big{)}=\frac{\hbox{% Var}\big{(}\mathbb{E}\big{[}s(\underline{\mathbf{y}}_{i})|\underline{% \boldsymbol{\eta}}_{i}\big{]}\big{)}}{\hbox{Var}\big{[}s(\underline{\mathbf{y}% }_{i})\big{]}}=1-\frac{\mathbb{E}\big{(}\hbox{Var}\big{[}s(\underline{\mathbf{% y}}_{i})|\underline{\boldsymbol{\eta}}_{i}\big{]}\big{)}}{\hbox{Var}\big{[}s(% \underline{\mathbf{y}}_{i})\big{]}}.italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) = divide start_ARG Var ( blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) end_ARG start_ARG Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_ARG = 1 - divide start_ARG blackboard_E ( Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) end_ARG start_ARG Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_ARG . (2)

In Equation 2, ϱ2(u¯,𝐯¯)superscriptitalic-ϱ2¯𝑢¯𝐯\varrho^{2}(\underline{u},\underline{\mathbf{v}})italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG bold_v end_ARG ) refers to the population coefficient of determination when regressing a scalar outcome variable u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG onto (possibly multiple) explanatory variables 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG. The last equality is due to the well-known law of total variance: Var[s(𝐲¯i)]Vardelimited-[]𝑠subscript¯𝐲𝑖\hbox{Var}\big{[}s(\underline{\mathbf{y}}_{i})\big{]}Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] = 𝔼(Var[s(𝐲¯i)|𝜼¯i])𝔼Vardelimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖\mathbb{E}\big{(}\hbox{Var}\big{[}s(\underline{\mathbf{y}}_{i})|\underline{% \boldsymbol{\eta}}_{i}]\big{)}blackboard_E ( Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) + 𝔼(Var[s(𝐲¯i)|𝜼¯i])𝔼Vardelimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖\mathbb{E}\big{(}\hbox{Var}\big{[}s(\underline{\mathbf{y}}_{i})|\underline{% \boldsymbol{\eta}}_{i}\big{]}\big{)}blackboard_E ( Var [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ).

The prediction decomposition is defined for a scalar-valued latent score ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and is expressed as

ξ(𝜼i)=𝔼[ξ(𝜼¯i)|𝐲i]+δi.𝜉subscript𝜼𝑖𝔼delimited-[]conditional𝜉subscript¯𝜼𝑖subscript𝐲𝑖subscript𝛿𝑖\xi(\boldsymbol{\eta}_{i})=\mathbb{E}\big{[}\xi(\underline{\boldsymbol{\eta}}_% {i})|\mathbf{y}_{i}\big{]}+\delta_{i}.italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] + italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (3)

Equation 3 can also be interpreted in terms of two regressions. It is a (potentially nonlinear) regression of the latent score ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) on all the MVs in 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, or a unit-weight linear regression of ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) on 𝔼[ξ(𝜼¯i)|𝐲¯i]𝔼delimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖\mathbb{E}\big{[}\xi(\underline{\boldsymbol{\eta}}_{i})|\underline{\mathbf{y}}% _{i}\big{]}blackboard_E [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], which is the expected a posteriori (EAP) score of ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Note that the EAP score minimizes the mean squared error (MSE) among all predictors of ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and the minimized MSE is given by 𝔼(Var[ξ(𝜼¯i)|𝐲¯i])𝔼Vardelimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖\mathbb{E}\big{(}\hbox{Var}\big{[}\xi(\underline{\boldsymbol{\eta}}_{i})|% \underline{\mathbf{y}}_{i}\big{]}\big{)}blackboard_E ( Var [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ).222We denote a predictor of ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) by g(𝐲¯i)𝑔subscript¯𝐲𝑖g(\underline{\mathbf{y}}_{i})italic_g ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which is a function of 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The MSE in prediction is given by 𝔼[g(𝐲¯i)ξ(𝜼¯i)]2𝔼superscriptdelimited-[]𝑔subscript¯𝐲𝑖𝜉subscript¯𝜼𝑖2\mathbb{E}[g(\underline{\mathbf{y}}_{i})-\xi(\underline{\boldsymbol{\eta}}_{i}% )]^{2}blackboard_E [ italic_g ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which is minimized by the EAP score (Casella \BBA Berger, \APACyear2002, Exercise 4.13). Thus, the coefficient of determination resulting from Equation 3 is

ϱ2(ξ(𝜼¯i),𝐲¯i)=ϱ2(ξ(𝜼¯i),𝔼[ξ(𝜼¯i)|𝐲¯i])=Var(𝔼[ξ(𝜼¯i)|𝐲¯i])Var[ξ(𝜼¯i)]=1𝔼(Var[ξ(𝜼¯i)|𝐲¯i])Var[ξ(𝜼¯i)],superscriptitalic-ϱ2𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖superscriptitalic-ϱ2𝜉subscript¯𝜼𝑖𝔼delimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖Var𝔼delimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖Vardelimited-[]𝜉subscript¯𝜼𝑖1𝔼Vardelimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖Vardelimited-[]𝜉subscript¯𝜼𝑖\varrho^{2}(\xi(\underline{\boldsymbol{\eta}}_{i}),\underline{\mathbf{y}}_{i})% =\varrho^{2}\big{(}\xi(\underline{\boldsymbol{\eta}}_{i}),\mathbb{E}\big{[}\xi% (\underline{\boldsymbol{\eta}}_{i})|\underline{\mathbf{y}}_{i}\big{]}\big{)}=% \frac{\hbox{Var}\big{(}\mathbb{E}\big{[}\xi(\underline{\boldsymbol{\eta}}_{i})% |\underline{\mathbf{y}}_{i}\big{]}\big{)}}{\hbox{Var}\big{[}\xi(\underline{% \boldsymbol{\eta}}_{i})\big{]}}=1-\frac{\mathbb{E}\big{(}\hbox{Var}\big{[}\xi(% \underline{\boldsymbol{\eta}}_{i})|\underline{\mathbf{y}}_{i}\big{]}\big{)}}{% \hbox{Var}\big{[}\xi(\underline{\boldsymbol{\eta}}_{i})\big{]}},italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) = divide start_ARG Var ( blackboard_E [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) end_ARG start_ARG Var [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_ARG = 1 - divide start_ARG blackboard_E ( Var [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ) end_ARG start_ARG Var [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_ARG , (4)

which quantifies the proportion of MSE reduction when predicting ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) from 𝐲isubscript𝐲𝑖\mathbf{y}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Equation 4, henceforth termed PRMSE, is a popular measure of reliability in the IRT literature.

In sum, reliability coefficients are strictly defined as coefficients of determination within the regression framework Liu \BOthers. (\APACyear2024); McDonald (\APACyear2011). In the measurement decomposition of an observed score, the explanatory variables must be all the LVs in the measurement model (or equivalently the true score underlying the observed score). Alternatively, in the prediction decomposition of a latent score, the explanatory variables must be all the MVs in 𝐲isubscript𝐲𝑖\mathbf{y}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (or equivalently the EAP predictor of the latent score). The coefficient of determination quantifies the magnitude of association between outcome and explanatory variables. Next, we extend the definition of reliability to more general measures of association between selected observed and latent scores.

Reliability as a Measure of Association

Given msuperscript𝑚m^{*}italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-dimensional observed scores 𝐬(𝐲¯i)𝐬subscript¯𝐲𝑖\mathbf{s}(\underline{\mathbf{y}}_{i})bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and dsuperscript𝑑d^{*}italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-dimensional latent scores 𝝃(𝜼¯i)𝝃subscript¯𝜼𝑖\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), we define reliability as an association measure:

A:m×d[0,1],[𝐬(𝐲¯i),𝝃(𝜼¯i)]A(𝐬(𝐲¯i),𝝃(𝜼¯i)).:𝐴formulae-sequencesuperscriptsuperscript𝑚superscriptsuperscript𝑑01maps-tosuperscript𝐬superscriptsubscript¯𝐲𝑖𝝃superscriptsubscript¯𝜼𝑖𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A:\mathbb{R}^{m^{*}}\times\mathbb{R}^{d^{*}}\to[0,1],\ \big{[}\mathbf{s}(% \underline{\mathbf{y}}_{i})^{\prime},\boldsymbol{\xi}(\underline{\boldsymbol{% \eta}}_{i})^{\prime}\big{]}^{\prime}\mapsto A\big{(}\mathbf{s}(\underline{% \mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})\big{)}.italic_A : blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → [ 0 , 1 ] , [ bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↦ italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) . (5)

In words, the association measure A𝐴Aitalic_A maps an observed score vector 𝐬(𝐲¯i)𝐬subscript¯𝐲𝑖\mathbf{s}(\underline{\mathbf{y}}_{i})bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) of size m×1superscript𝑚1m^{*}\times 1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × 1 and a latent score vector 𝝃(𝜼¯i)𝝃subscript¯𝜼𝑖\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) of size d×1superscript𝑑1d^{*}\times 1italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT × 1 to a value on the unit interval [0,1]01[0,1][ 0 , 1 ]. The larger the value of the association measure, the more closely aligned observed scores 𝐬(𝐲¯i)𝐬subscript¯𝐲𝑖\mathbf{s}(\underline{\mathbf{y}}_{i})bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) are with latent scores 𝝃(𝜼¯i)𝝃subscript¯𝜼𝑖\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Because CTT reliability and PRMSE are coefficients of determination, they are special cases of the general definition in Equation 5. For CTT reliability, the observed score in Equation 5 is s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})\in\mathbb{R}italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R, and the latent score(s) are either the LVs in 𝜼¯idsubscript¯𝜼𝑖superscript𝑑\underline{\boldsymbol{\eta}}_{i}\in\mathbb{R}^{d}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT or the true score 𝔼[s(𝐲¯i)|𝜼¯i]𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript¯𝜼𝑖\mathbb{E}\big{[}s(\underline{\mathbf{y}}_{i})|\underline{\boldsymbol{\eta}}_{% i}\big{]}\in\mathbb{R}blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ∈ blackboard_R. For PRMSE, the observed score(s) in Equation 5 are either the MVs in 𝐲¯imsubscript¯𝐲𝑖superscript𝑚\underline{\mathbf{y}}_{i}\in\mathbb{R}^{m}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT or the EAP score 𝔼[ξ(𝜼¯i)|𝐲¯i]𝔼delimited-[]conditional𝜉subscript¯𝜼𝑖subscript¯𝐲𝑖\mathbb{E}\big{[}\xi(\underline{\boldsymbol{\eta}}_{i})|\underline{\mathbf{y}}% _{i}\big{]}\in\mathbb{R}blackboard_E [ italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ∈ blackboard_R and the latent score is ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})\in\mathbb{R}italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R.

Because Equation 5 represents a broad class of association measures in which regression-based coefficients of determination are a special case, we first describe desirable statistical properties of A𝐴Aitalic_A. These desiderata facilitate defining, estimating, interpreting, and organizing various reliability coefficients that are based on association measures. The four desiderata are estimability, normalization, symmetry, and invariance. In general, estimability and normalization are unequivocally desirable properties whereas symmetry and invariance might be desirable in certain contexts. Below, we implicitly assume that the association measure A𝐴Aitalic_A is defined and computable for (almost) all values in m×dsuperscriptsuperscript𝑚superscriptsuperscript𝑑\mathbb{R}^{m^{*}}\times\mathbb{R}^{d^{*}}blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Estimability

Recall that the joint distribution of 𝐲¯isubscript¯𝐲𝑖\underline{\mathbf{y}}_{i}under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝜼¯isubscript¯𝜼𝑖\underline{\boldsymbol{\eta}}_{i}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is determined by the specified LV measurement model. Then, a reliability coefficient A(𝐬(𝐲¯i),𝝃(𝜼¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) is a function of parameters in the measurement model and is thus also a population parameter. Estimability means that A(𝐬(𝐲¯i),𝝃(𝜼¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) can be consistently estimated from an independent and identically distributed sample 𝐲1,,𝐲nsubscript𝐲1subscript𝐲𝑛\mathbf{y}_{1},\dots,\mathbf{y}_{n}bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT such that large-sample confidence intervals (CIs) for A(𝐬(𝐲¯i),𝝃(𝜼¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) can be constructed. Estimability assures precise point estimates of reliability coefficients and large-sample CIs under two conditions. First, the measurement model should be (locally) identified for model parameters to be consistently estimated <see, e.g.,>[Chapter 2]bekker.merckens&wansbeek.2014. Second, A(𝐬(𝐲¯i),𝝃(𝜼¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) should be an almost surely continuous function of the model parameters such that consistent estimates of reliability coefficients can be obtained by the continuous map** theorem <e.g.,>[Theorem 2.3]vandervaart.1998. Furthermore, under complex nonlinear measurement models in which computations for reliability become intractable, we can approximate reliability coefficients using a large Monte Carlo (MC) sample of observed and latent scores generated from the fitted measurement model; see \citeAliu.pek&maydeu-olivares.2024 and the “Numerical Study” section for details.

Large-sample CIs for reliability coefficients such as analytical methods <e.g., the Delta method;>[Section 5.3]bickel&doksom.2015 might require additional assumptions, which are useful when efficient evaluation of model-implied quantities is viable. Alternatively, resampling methods <e.g., bootstrap**;>efron&tibshirani.1993 are more convenient to implement due to their plug-and-play nature. In sum, estimability ensures consistent estimation of reliability coefficients, accompanied by large-sample CIs.

Normalization
A normalized measure of association A(𝐬(𝐲¯i),𝝃(𝜼¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) is defined on the unit interval [0,1]01[0,1][ 0 , 1 ]. Normalization aids in interpretation because the value of zero indicates absence of association and the value of one indicates perfect association. In this vein, zero reliability implies that the observed scores contain only measurement error and are not relevant to the latent scores. Conversely, a value of one on reliability implies that the observed scores are essentially equivalent to the latent scores—or equivalently, the observed scores are free of measurement error and are perfect proxies of the latent scores.

The absence of association (zero reliability) has at least two interpretations. First, from the regression framework, a zero coefficient of determination implies that the conditional expectation of the outcome variable given the predictor variables has no variability. Stated differently, the conditional and unconditional expectations of the outcome are equal, sometimes referred to as linear independence <e.g.,>[Definition 2.11.1]lord&novick.1968. Second, a zero coefficient can imply statistical independence; that is, the joint pdf of observed scores 𝐬(𝐲¯i)𝐬subscript¯𝐲𝑖\mathbf{s}(\underline{\mathbf{y}}_{i})bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and latent scores 𝝃(𝜼¯i)𝝃subscript¯𝜼𝑖\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) can be factorized into the product of their marginal densities. Statistical independence implies linear independence but not vice versa.

A perfect association implies a deterministic relationship between latent and observed variables. Different measures of association differ in (a) whether the deterministic relationship should be established in one direction or in both directions, and (b) which family of deterministic functions are involved. As for (a), the regression framework is asymmetric and a perfect regression association only requires the outcome to be a deterministic function of the explanatory variables. In contrast, a perfect association (see section below) implies that both sets of scores can be interchangeably represented as deterministic functions of each other. In terms of (b), families of deterministic functions include linear functions with nonzero slopes (e.g., the squared or absolute Pearson correlation), strictly monotone functions <e.g., >nelsen.2006, schweizer&wolff.1981, and implicitly defined functions <e.g.,>geenens&ldm.2022.

While it is desirable to use scores with reliability close to one, it is challenging to suggest a universal cutoff of acceptable reliability for two reasons. First, different association measures are often not directly comparable, in which values of 0 or 1 might map onto different concepts (i.e., measures have different conceptual definitions of zero and perfect associations). Second, the same amount of measurement error may have different downstream effects depending on the use of observed scores <e.g., recovering latent scores, classifying individuals, and being entered as proxies of latent scores in an explanatory model; see >liu&pek.inpress. There is no shortcut but to study the consequences of measurement error in a case-by-case fashion. We will revisit this point in the “Numerical Study” section with a concrete example.

Symmetry

The association measure A𝐴Aitalic_A (Equation 5) is symmetric if and only if A(𝐬(𝐲¯i),𝝃(𝜼¯i))=A(𝝃(𝜼¯i),𝐬(𝐲¯i))𝐴𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖𝐴𝝃subscript¯𝜼𝑖𝐬subscript¯𝐲𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}=A\big{(}\boldsymbol{\xi}(\underline{\boldsymbol% {\eta}}_{i}),\mathbf{s}(\underline{\mathbf{y}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = italic_A ( bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). When m=d=1superscript𝑚superscript𝑑1m^{*}=d^{*}=1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1, coefficients of determination based on nonlinear regressions (e.g., CTT reliability and PRMSE) are usually asymmetric. Symmetry is an optional desideratum might be desirable in specific contexts. First, symmetry is helpful when it is difficult to unequivocally designate either the observed or latent scores as the regression outcome (e.g., measurement versus prediction decompositions). Second, symmetry can avoid potential confusion between two different valued asymmetric measures of association about the same observed and latent scores, which occurs with nonlinear measurement models (e.g., IRT; see \citeNPliu.pek&maydeu-olivares.2024). Symmetric measures of association can be formulated using cross-product moments (e.g., the squared or absolute Pearson correlation; the maximal correlation; \citeNPgebelein.1941), joint cumulative distribution functions <cdfs; e.g.,>blum&kieferrosenblatt.1961, hoeffding.1948, ranks <e.g.,>kruskal.1958, copulas <e.g.,>schweizer&wolff.1981, mutual information and entropy <e.g.,>joe.1989, distance metrics between pdfs <e.g.,>ali&silvey.1965, and distance covariance <e.g.,>szekely&rizzobakirov.2007. Readers are referred \citeAtjostheim&otneimstove.2022 for a comprehensive review.

Invariance

Invariance is related to transformations applied to observed and latent scores. Let {\cal F}caligraphic_F and {\cal H}caligraphic_H be two suitable families of transformations supported on msuperscriptsuperscript𝑚\mathbb{R}^{m^{*}}blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and dsuperscriptsuperscript𝑑\mathbb{R}^{d^{*}}blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, respectively. The association measure A𝐴Aitalic_A is invariant with respect to the pair of transformation families (,)({\cal F},{\cal H})( caligraphic_F , caligraphic_H ) if A(f(𝐬(𝐲¯i)),h(𝝃(𝜼¯i)))𝐴𝑓𝐬subscript¯𝐲𝑖𝝃subscript¯𝜼𝑖A\big{(}f(\mathbf{s}(\underline{\mathbf{y}}_{i})),h(\boldsymbol{\xi}(% \underline{\boldsymbol{\eta}}_{i}))\big{)}italic_A ( italic_f ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) , italic_h ( bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ) = A(𝐬(𝐲¯i),𝛏(𝛈¯i))𝐴𝐬subscript¯𝐲𝑖𝛏subscript¯𝛈𝑖A\big{(}\mathbf{s}(\underline{\mathbf{y}}_{i}),\boldsymbol{\xi}(\underline{% \boldsymbol{\eta}}_{i})\big{)}italic_A ( bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) for all f𝑓f\in{\cal F}italic_f ∈ caligraphic_F and hh\in{\cal H}italic_h ∈ caligraphic_H. In words, the association measure remains unchanged under certain transformations of observed and latent scores. The expression above can accommodate potentially different families of transformations (i.e., \cal Fcaligraphic_F and \cal Hcaligraphic_H) for the two sets of scores. Observe that regression-based coefficients of determination satisfy a form of invariance. Consider regressing s(𝐲¯i)𝑠subscript¯𝐲𝑖s(\underline{\mathbf{y}}_{i})\in\mathbb{R}italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R onto 𝜼¯idsubscript¯𝜼𝑖superscript𝑑\underline{\boldsymbol{\eta}}_{i}\in\mathbb{R}^{d}under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (i.e., a measurement decomposition). If we set ={{\cal F}=\{caligraphic_F = {all invertible linear transformations on }\mathbb{R}\}blackboard_R } and ={{\cal H}=\{caligraphic_H = {all invertible transformations on d}\mathbb{R}^{d}\}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT }, then the corresponding coefficient of determination is invariant with respect to (,)({\cal F},{\cal H})( caligraphic_F , caligraphic_H ).333To see why, let the original regression be expressed by s(𝐲i)=ω(𝜼i)+εi𝑠subscript𝐲𝑖𝜔subscript𝜼𝑖subscript𝜀𝑖s(\mathbf{y}_{i})=\omega(\boldsymbol{\eta}_{i})+\varepsilon_{i}italic_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_ω ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with fitted value ω(𝜼i)=𝔼[s(𝐲¯i)|𝜼i]𝜔subscript𝜼𝑖𝔼delimited-[]conditional𝑠subscript¯𝐲𝑖subscript𝜼𝑖\omega(\boldsymbol{\eta}_{i})=\mathbb{E}[s(\underline{\mathbf{y}}_{i})|% \boldsymbol{\eta}_{i}]italic_ω ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ italic_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] and error term εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Take any f𝑓f\in{\cal F}italic_f ∈ caligraphic_F and hh\in{\cal H}italic_h ∈ caligraphic_H such that f(x)=a+bx𝑓𝑥𝑎𝑏𝑥f(x)=a+bxitalic_f ( italic_x ) = italic_a + italic_b italic_x with b0𝑏0b\neq 0italic_b ≠ 0 and hhitalic_h has a well-defined inverse h1superscript1h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Then f(s(𝐲i))𝑓𝑠subscript𝐲𝑖f(s(\mathbf{y}_{i}))italic_f ( italic_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = a+bs(𝐲i)𝑎𝑏𝑠subscript𝐲𝑖a+bs(\mathbf{y}_{i})italic_a + italic_b italic_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [a+b(ωh1)(h(𝜼i))]delimited-[]𝑎𝑏𝜔superscript1subscript𝜼𝑖[a+b(\omega\circ h^{-1})(h(\boldsymbol{\eta}_{i}))][ italic_a + italic_b ( italic_ω ∘ italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( italic_h ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ] + (a+bεi)𝑎𝑏subscript𝜀𝑖(a+b\varepsilon_{i})( italic_a + italic_b italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which can be viewed as a regression onto h(𝜼i)subscript𝜼𝑖h(\boldsymbol{\eta}_{i})italic_h ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) with predicted value a+b(ωh1)(h(𝜼i))𝑎𝑏𝜔superscript1subscript𝜼𝑖a+b(\omega\circ h^{-1})(h(\boldsymbol{\eta}_{i}))italic_a + italic_b ( italic_ω ∘ italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) ( italic_h ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) and error term a+bεi𝑎𝑏subscript𝜀𝑖a+b\varepsilon_{i}italic_a + italic_b italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Because the same linear transform is applied to both the outcome and error, the coefficient of determination remains intact. Similarly, when m=d=1superscript𝑚superscript𝑑1m^{*}=d^{*}=1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1, the squared and absolute Pearson correlation are invariant to invertible linear transformations.

Coefficients of determination and Pearson correlations, however, are not invariant with respect to nonlinear transformations. For instance, let ξ(𝜼¯i)𝜉subscript¯𝜼𝑖\xi(\underline{\boldsymbol{\eta}}_{i})\in\mathbb{R}italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R follow a standard normal distribution and let ΦΦ\Phiroman_Φ denote its cdf. Then the percentile rank ξ~(𝜼i)=100Φ(ξ(𝜼i))~𝜉subscript𝜼𝑖100Φ𝜉subscript𝜼𝑖\tilde{\xi}(\boldsymbol{\eta}_{i})=100\Phi(\xi(\boldsymbol{\eta}_{i}))over~ start_ARG italic_ξ end_ARG ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 100 roman_Φ ( italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) is a strictly monotone transformation of the original latent score ξ(𝜼i)𝜉subscript𝜼𝑖\xi(\boldsymbol{\eta}_{i})italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Because of the nonlinearity of the cdf, the PRMSEs for predicting ξ(𝜼i)𝜉subscript𝜼𝑖\xi(\boldsymbol{\eta}_{i})italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) versus ξ~(𝜼i)~𝜉subscript𝜼𝑖\tilde{\xi}(\boldsymbol{\eta}_{i})over~ start_ARG italic_ξ end_ARG ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) by their respective EAP estimates are often not the same. In contrast, a measure of association satisfying invariance with respect to strictly monotone transformations would yield identical reliability coefficients in both scenarios. Invariance might have intuitive appeal based on the expectation that observed data should carry the same information in predicting related latent quantities that have a one-to-one correspondence.

Invariance is part of the well-known “Rényi’s Axioms” <e.g., >geenens&ldm.2022, nelsen.2006, renyi.1959, schweizerwolff.1981, which collect advisable statistical principles in defining measures of association. Several proposed symmetric association measures cited in the “Symmetry” section satisfy invariance beyond linear transformations. For asymmetric measures of association, the coefficient considered by \citeAazadkia&chatterjee.2021, which generalizes \citeAchatterjee.2021 and \citeAdette&siburgstoimenov.2013, is invariant to strictly monotone transformations of the outcome and might be used as an alternative to coefficients of determination in generalized measurement and prediction decompositions. We present examples of symmetry in the next section.

Examples

Absolute and Squared Pearson Correlation

We consider first the simplest case in which observed and latent scores are unidimensional (i.e., m=d=1superscript𝑚superscript𝑑1m^{*}=d^{*}=1italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1). \citeAweiss.1982 computed the (Pearson) correlation (termed a “fidelity correlation") between true and estimated latent ability scores to evaluate different adaptive testing strategies. Because estimated ability scores usually correlate positively with true ability scores under a unidimensional IRT model, the fidelity correlation can be conceived as a reliability coefficient using the absolute correlation as the association measure. Similarly, \citeAkim.2012 referred to the squared correlation between a pair of true and estimated latent ability scores as a squared-correlation reliability. For simplicity, we only consider squared correlation below. Let u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}\in\mathbb{R}under¯ start_ARG italic_v end_ARG ∈ blackboard_R be two random scalars. The squared correlation between u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG can be expressed as

Corr2(u¯,v¯)=Cov(u¯,v¯)2Var(u¯)Var(v¯).superscriptCorr2¯𝑢¯𝑣Covsuperscript¯𝑢¯𝑣2Var¯𝑢Var¯𝑣\hbox{Corr}^{2}\big{(}\underline{u},\underline{v}\big{)}=\frac{\hbox{Cov}(% \underline{u},\underline{v})^{2}}{\hbox{Var}(\underline{u})\hbox{Var}(% \underline{v})}.Corr start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) = divide start_ARG Cov ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG Var ( under¯ start_ARG italic_u end_ARG ) Var ( under¯ start_ARG italic_v end_ARG ) end_ARG . (6)

Note that Corr2(u¯,v¯)superscriptCorr2¯𝑢¯𝑣\hbox{Corr}^{2}(\underline{u},\underline{v})Corr start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) is distinct from the coefficient of determination ϱ2(u¯,v¯)superscriptitalic-ϱ2¯𝑢¯𝑣\varrho^{2}(\underline{u},\underline{v})italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ). The two quantities coincide only when the regression of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG on v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG is linear. Equation 6 satisfies the estimability, normalization, and symmetry desiderata, and is only invariant to non-vanishing linear transformations of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG.

Coefficient Sigma

Let us continue assuming that the observed and latent scores are unidimensional. To allow for nonlinear associations while achieving invariance of nonlinear transformations, symmetric measures of association based on Rényi’s Axioms can be substituted in place of the absolute or squared correlation. Let

ς~(u¯,v¯)=4sin2(π6ς(u¯,v¯))~𝜍¯𝑢¯𝑣4superscript2𝜋6𝜍¯𝑢¯𝑣\tilde{\varsigma}\big{(}\underline{u},\underline{v}\big{)}=4\sin^{2}\left(% \frac{\pi}{6}\varsigma(\underline{u},\underline{v})\right)over~ start_ARG italic_ς end_ARG ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) = 4 roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_π end_ARG start_ARG 6 end_ARG italic_ς ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) ) (7)

be the rescaled coefficient sigma,444In \citeAschweizer&wolff.1981, coefficient sigma ς𝜍\varsigmaitalic_ς was defined only for continuous random variables. Here, we extend its use to possibly discrete scores. For example, the observed scores are discrete when the MVs are discrete, and some measurement models (e.g., latent class models) incorporate discrete LVs which further results in discrete latent scores. Note that a coefficient sigma computed for discrete scores no longer exactly satisfies the Rényi’s Axioms. with

ς(u¯,v¯)=122|Fu,v(s,t)Fu(s)Fv(t)|Fu(ds)Fv(dt)𝜍¯𝑢¯𝑣12subscriptdouble-integralsuperscript2subscript𝐹𝑢𝑣𝑠𝑡subscript𝐹𝑢𝑠subscript𝐹𝑣𝑡subscript𝐹𝑢𝑑𝑠subscript𝐹𝑣𝑑𝑡\varsigma\big{(}\underline{u},\underline{v}\big{)}=12\iint_{\mathbb{R}^{2}}% \big{|}F_{u,v}(s,t)-F_{u}(s)F_{v}(t)\big{|}F_{u}(ds)F_{v}(dt)italic_ς ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) = 12 ∬ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_F start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT ( italic_s , italic_t ) - italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_s ) italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_t ) | italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_d italic_s ) italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_d italic_t ) (8)

as the original coefficient sigma Schweizer \BBA Wolff (\APACyear1981). In Equation 8, Fu,vsubscript𝐹𝑢𝑣F_{u,v}italic_F start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT denotes the joint cdf of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG, and Fusubscript𝐹𝑢F_{u}italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and Fvsubscript𝐹𝑣F_{v}italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT are the marginal cdfs of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG, respectively. The original coefficient sigma, ς𝜍\varsigmaitalic_ς (Equation 8) then measures the average absolute deviation between the actual joint distribution of two scores, Fu,v(s,t)subscript𝐹𝑢𝑣𝑠𝑡F_{u,v}(s,t)italic_F start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT ( italic_s , italic_t ), and the simpler joint distribution in which u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG is independent of v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG, Fu(s)Fv(t)subscript𝐹𝑢𝑠subscript𝐹𝑣𝑡F_{u}(s)F_{v}(t)italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_s ) italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_t ). Equation 8 is a successive integral over the marginal distributions of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG and the integrand only depends on the cdfs; therefore, ς(u¯,v¯)𝜍¯𝑢¯𝑣\varsigma(\underline{u},\underline{v})italic_ς ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) is invariant to one-to-one transformations of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG. The transformation in Equation 7 is monotone, guaranteeing that ς~(u¯,v¯)~𝜍¯𝑢¯𝑣\tilde{\varsigma}(\underline{u},\underline{v})over~ start_ARG italic_ς end_ARG ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG italic_v end_ARG ) coincides with the squared Pearson correlation when u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG follow a bivariate normal distribution Schweizer \BBA Wolff (\APACyear1981). The original coefficient sigma, ς𝜍\varsigmaitalic_ς, is also closely related to Spearman’s correlation, which is obtained by replacing the absolute difference in Equation 8 by the signed difference. If the two scores are positively quadrant dependent,555u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG are positive quadrant dependent if Fu,v(s,t)Fu(s)Fv(t)subscript𝐹𝑢𝑣𝑠𝑡subscript𝐹𝑢𝑠subscript𝐹𝑣𝑡F_{u,v}(s,t)\geq F_{u}(s)F_{v}(t)italic_F start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT ( italic_s , italic_t ) ≥ italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_s ) italic_F start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( italic_t ) for all s,t𝑠𝑡s,t\in\mathbb{R}italic_s , italic_t ∈ blackboard_R (Nelsen, \APACyear2006, Definition 5.2.1). then the original coefficient sigma, ς𝜍\varsigmaitalic_ς, and Spearman’s correlation are identical (Nelsen, \APACyear2006, p. 209). The rescaled coefficient sigma (Equation 7) satisfies estimability, normalization, and symmetry; it is also invariant to strictly monotone transformations for both u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG and v¯¯𝑣\underline{v}under¯ start_ARG italic_v end_ARG.

Mutual Information
This example illustrates a symmetric association measure when both the observed and latent scores are potentially multidimensional. The mutual information between random vectors 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG of any dimension can be expressed as

M(𝐮¯,𝐯¯)=log[f𝐮,𝐯(𝐬,𝐭)f𝐮(𝐬)f𝐯(𝐭)]Fu,v(d𝐬,d𝐭),𝑀¯𝐮¯𝐯double-integralsubscript𝑓𝐮𝐯𝐬𝐭subscript𝑓𝐮𝐬subscript𝑓𝐯𝐭subscript𝐹𝑢𝑣𝑑𝐬𝑑𝐭M\big{(}\underline{\mathbf{u}},\underline{\mathbf{v}}\big{)}=\iint\log\left[% \frac{f_{\mathbf{u},\mathbf{v}}(\mathbf{s},\mathbf{t})}{f_{\mathbf{u}}(\mathbf% {s})f_{\mathbf{v}}(\mathbf{t})}\right]F_{u,v}(d\mathbf{s},d\mathbf{t}),italic_M ( under¯ start_ARG bold_u end_ARG , under¯ start_ARG bold_v end_ARG ) = ∬ roman_log [ divide start_ARG italic_f start_POSTSUBSCRIPT bold_u , bold_v end_POSTSUBSCRIPT ( bold_s , bold_t ) end_ARG start_ARG italic_f start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT ( bold_s ) italic_f start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT ( bold_t ) end_ARG ] italic_F start_POSTSUBSCRIPT italic_u , italic_v end_POSTSUBSCRIPT ( italic_d bold_s , italic_d bold_t ) , (9)

in which f𝐮,𝐯subscript𝑓𝐮𝐯f_{\mathbf{u},\mathbf{v}}italic_f start_POSTSUBSCRIPT bold_u , bold_v end_POSTSUBSCRIPT denotes the joint pdf of 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG, f𝐮subscript𝑓𝐮f_{\mathbf{u}}italic_f start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT denotes the marginal pdf 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG, and f𝐯subscript𝑓𝐯f_{\mathbf{v}}italic_f start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT denotes the marginal pdf of 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG. Mutual information (Equation 9) is the Kullback-Leibler divergence of the true joint pdf of 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG, f𝐮,𝐯(𝐬,𝐭)subscript𝑓𝐮𝐯𝐬𝐭f_{\mathbf{u},\mathbf{v}}(\mathbf{s},\mathbf{t})italic_f start_POSTSUBSCRIPT bold_u , bold_v end_POSTSUBSCRIPT ( bold_s , bold_t ), from the simpler pdf in which the two random vectors are independent, f𝐮(𝐬)f𝐯(𝐭)subscript𝑓𝐮𝐬subscript𝑓𝐯𝐭f_{\mathbf{u}}(\mathbf{s})f_{\mathbf{v}}(\mathbf{t})italic_f start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT ( bold_s ) italic_f start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT ( bold_t ). Thus, mutual information is non-negative and attains zero if and only if 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG are independent. From Equation 9, mutual information is also symmetric and invariant to invertible transformations of 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG. However, mutual information is not bounded from above. To normalize mutual information to the unit interval, Joe (\citeyearNPjoe.1989; see also \citeNPlinfoot.1957) proposed rescaling M𝑀Mitalic_M by

M~(𝐮¯,𝐯¯)=1exp[2M(𝐮¯,𝐯¯)].~𝑀¯𝐮¯𝐯12𝑀¯𝐮¯𝐯\tilde{M}(\underline{\mathbf{u}},\underline{\mathbf{v}})=1-\exp\left[-2M(% \underline{\mathbf{u}},\underline{\mathbf{v}})\right].over~ start_ARG italic_M end_ARG ( under¯ start_ARG bold_u end_ARG , under¯ start_ARG bold_v end_ARG ) = 1 - roman_exp [ - 2 italic_M ( under¯ start_ARG bold_u end_ARG , under¯ start_ARG bold_v end_ARG ) ] . (10)

When 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG follow jointly a multivariate normal distribution, \citeAjoe.1989 showed that M~~𝑀\tilde{M}over~ start_ARG italic_M end_ARG reduces to the squared Pearson correlation when both random vectors reduce to random scalars (i.e., 𝐮¯=u¯¯𝐮¯𝑢\underline{\mathbf{u}}=\underline{u}under¯ start_ARG bold_u end_ARG = under¯ start_ARG italic_u end_ARG and 𝐯¯=v¯¯𝐯¯𝑣\underline{\mathbf{v}}=\underline{v}under¯ start_ARG bold_v end_ARG = under¯ start_ARG italic_v end_ARG); M~~𝑀\tilde{M}over~ start_ARG italic_M end_ARG also reduces to the coefficient of determination when one of the two random quantities is unidimensional and used as the regression outcome. These special cases justify the normalization of mutual information by map** x1exp(2x)maps-to𝑥12𝑥x\mapsto 1-\exp(-2x)italic_x ↦ 1 - roman_exp ( - 2 italic_x ). Mutual information has been applied to quantify measurement precision in measurement models with both discrete and continuous LVs <e.g.,>chen&liuxu.2018, johnsonsinharay.2020, markon.2013, markon.2023, sinharayjohnson.2019. The rescaled mutual information (Equation 10) satisfies the estimability, normalization, and symmetry desiderata, and is invariant to invertible transformations of 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG.

Coefficient 𝑻𝑻Tbold_italic_T
The third example features an asymmetric measure, in which we find an alternative to the coefficient of determination that is invariant to strictly monotone transformations of the outcome variable. Let u¯¯𝑢\underline{u}\in\mathbb{R}under¯ start_ARG italic_u end_ARG ∈ blackboard_R be a scalar outcome variable and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG be a set of explanatory variables. Define the Azadkia-Chatterjee coefficient T𝑇Titalic_T as

T(u¯,𝐯¯)=Var({u¯>s|𝐯¯})Fu(ds)Var(𝕀{u¯>s})Fu(ds),𝑇¯𝑢¯𝐯subscriptVarconditional-set¯𝑢𝑠¯𝐯subscript𝐹𝑢𝑑𝑠subscriptVar𝕀¯𝑢𝑠subscript𝐹𝑢𝑑𝑠T(\underline{u},\underline{\mathbf{v}})=\frac{\int_{\mathbb{R}}\hbox{Var}\big{% (}\mathbb{P}\{\underline{u}>s|\underline{\mathbf{v}}\}\big{)}F_{u}(ds)}{\int_{% \mathbb{R}}\hbox{Var}\big{(}\mathbb{I}\{\underline{u}>s\}\big{)}F_{u}(ds)},italic_T ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG bold_v end_ARG ) = divide start_ARG ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT Var ( blackboard_P { under¯ start_ARG italic_u end_ARG > italic_s | under¯ start_ARG bold_v end_ARG } ) italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_d italic_s ) end_ARG start_ARG ∫ start_POSTSUBSCRIPT blackboard_R end_POSTSUBSCRIPT Var ( blackboard_I { under¯ start_ARG italic_u end_ARG > italic_s } ) italic_F start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_d italic_s ) end_ARG , (11)

in which {u¯>s|𝐯}conditional-set¯𝑢𝑠𝐯\mathbb{P}\{\underline{u}>s|\mathbf{v}\}blackboard_P { under¯ start_ARG italic_u end_ARG > italic_s | bold_v } denotes the conditional probability of u¯>s¯𝑢𝑠\underline{u}>sunder¯ start_ARG italic_u end_ARG > italic_s given 𝐯𝐯\mathbf{v}bold_v and 𝕀{u¯>s}𝕀¯𝑢𝑠\mathbb{I}\{\underline{u}>s\}blackboard_I { under¯ start_ARG italic_u end_ARG > italic_s } is the indicator function of when u¯>s¯𝑢𝑠\underline{u}>sunder¯ start_ARG italic_u end_ARG > italic_s. Equation 11 also pertains to a signal-to-total ratio <STR; cf.,>cronbach&gleser.1964, analogous to the coefficient of determination. Recall that a coefficient of determination quantifies the amount of variance in the outcome (i.e., total information) that is taken into account by the predictor variables (i.e., signal) on the normalized scale. In a similar vein, the coefficient T𝑇Titalic_T partitions the total variability of a threshold-passing indicator of the outcome 𝕀{u¯>s}𝕀¯𝑢𝑠\mathbb{I}\{\underline{u}>s\}blackboard_I { under¯ start_ARG italic_u end_ARG > italic_s } and reflects the portion of the systematic variation ascribed to the predictor variables 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG, omitting the leftover variance unassociated with 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG. Because the threshold s𝑠sitalic_s is arbitrarily chosen, the systematic and total variability are then respectively integrated across all possible values of s𝑠sitalic_s under the marginal distribution of u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG. Invariance to strictly monotone transformations of the outcome variable follows from the use of the indicator function as well as the integral with respect to the outcome distribution. Similar to coefficients of determination, coefficient T𝑇Titalic_T is only applicable in the regression framework (i.e., T𝑇Titalic_T is asymmetric) and is estimable, normalized, and invariant to invertible transformations of explanatory variables. In addition, coefficient T𝑇Titalic_T is invariant to strict monotone transformations to the outcome whereas a coefficient of determination is only invariant to non-vanishing linear transformations.

Wilks’ Lambda
This example illustrates how measurement and prediction decompositions can be generalized to allow for multiple outcomes and free choice of explanatory variables. Given observed scores 𝐬(𝐲¯i)m𝐬subscript¯𝐲𝑖superscriptsuperscript𝑚\mathbf{s}(\underline{\mathbf{y}}_{i})\in\mathbb{R}^{m^{*}}bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT and latent scores 𝝃(𝜼¯i)d𝝃subscript¯𝜼𝑖superscriptsuperscript𝑑\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})\in\mathbb{R}^{d^{*}}bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Let a generalized measurement decomposition be defined by

𝐬(𝐲i)=𝔼[𝐬(𝐲¯i)|𝝃(𝜼i)]+𝜺i,𝐬subscript𝐲𝑖𝔼delimited-[]conditional𝐬subscript¯𝐲𝑖𝝃subscript𝜼𝑖superscriptsubscript𝜺𝑖\mathbf{s}(\mathbf{y}_{i})=\mathbb{E}\big{[}\mathbf{s}(\underline{\mathbf{y}}_% {i})|\boldsymbol{\xi}(\boldsymbol{\eta}_{i})\big{]}+\boldsymbol{\varepsilon}_{% i}^{*},bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] + bold_italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , (12)

and a generalized prediction decomposition be defined by

𝝃(𝜼i)=𝔼[𝝃(𝜼¯i)|𝐬(𝐲i)]+𝜹i.𝝃subscript𝜼𝑖𝔼delimited-[]conditional𝝃subscript¯𝜼𝑖𝐬subscript𝐲𝑖superscriptsubscript𝜹𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})=\mathbb{E}\big{[}\boldsymbol{\xi}(% \underline{\boldsymbol{\eta}}_{i})|\mathbf{s}(\mathbf{y}_{i})\big{]}+% \boldsymbol{\delta}_{i}^{*}.bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E [ bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] + bold_italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . (13)

In Equations 12 and 13, their outcome variables (i.e., 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )) and corresponding error terms (i.e., 𝜺isuperscriptsubscript𝜺𝑖\boldsymbol{\varepsilon}_{i}^{*}bold_italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 𝜹isuperscriptsubscript𝜹𝑖\boldsymbol{\delta}_{i}^{*}bold_italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT) can be multidimensional (cf. Equations 1 and 3 for measurement and prediction decompositions, respectively). Moreover, the explanatory variables that are being conditioned on the right-hand side of Equations 12 and 13 can be any latent scores (cf. only LVs or true scores in Equation 1) and any observed score (cf. only MVs or EAP scores in Equation 3), respectively. Various coefficients quantifying STR can be computed for multivariate regression models, generalizing the coefficient of determination.

Let 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG and 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG be multiple outcome and explanatory variables, respectively. Then, the multivariate regression of 𝐮¯¯𝐮\underline{\mathbf{u}}under¯ start_ARG bold_u end_ARG on 𝐯¯¯𝐯\underline{\mathbf{v}}under¯ start_ARG bold_v end_ARG is

𝐮=𝔼(𝐮¯|𝐯)+𝐞,𝐮𝔼conditional¯𝐮𝐯𝐞\mathbf{u}=\mathbb{E}\big{(}\underline{\mathbf{u}}|\mathbf{v}\big{)}+\mathbf{e},bold_u = blackboard_E ( under¯ start_ARG bold_u end_ARG | bold_v ) + bold_e , (14)

which subsumes Equations 12 and 13 as special cases. The error vector in Equation 14, 𝐞𝐞\mathbf{e}bold_e, satisfies Cov(𝐞¯)Cov¯𝐞\hbox{Cov}(\underline{\mathbf{e}})Cov ( under¯ start_ARG bold_e end_ARG ) = Cov(𝐮¯)Cov¯𝐮\hbox{Cov}(\underline{\mathbf{u}})Cov ( under¯ start_ARG bold_u end_ARG ) -- Cov[𝔼(𝐮¯|𝐯¯)]Covdelimited-[]𝔼conditional¯𝐮¯𝐯\hbox{Cov}\big{[}\mathbb{E}(\underline{\mathbf{u}}|\underline{\mathbf{v}})\big% {]}Cov [ blackboard_E ( under¯ start_ARG bold_u end_ARG | under¯ start_ARG bold_v end_ARG ) ], which is the multivariate analog to the law of total variance. A generalization for coefficients of determination in multivariate regression (Equation 14) is one minus Wilks’ lambda Wilks (\APACyear1932):

W(𝐮¯,𝐯¯)=1det(Cov(𝐞¯))det(Cov(𝐮¯))=det(Cov(𝐮¯))det(Cov(𝐞¯))det(Cov(𝐮¯)).𝑊¯𝐮¯𝐯1Cov¯𝐞Cov¯𝐮Cov¯𝐮Cov¯𝐞Cov¯𝐮W(\underline{\mathbf{u}},\underline{\mathbf{v}})=1-\frac{\det\big{(}\hbox{Cov}% (\underline{\mathbf{e}})\big{)}}{\det\big{(}\hbox{Cov}(\underline{\mathbf{u}})% \big{)}}=\frac{\det\big{(}\hbox{Cov}(\underline{\mathbf{u}})\big{)}-\det\big{(% }\hbox{Cov}(\underline{\mathbf{e}})\big{)}}{\det\big{(}\hbox{Cov}(\underline{% \mathbf{u}})\big{)}}.italic_W ( under¯ start_ARG bold_u end_ARG , under¯ start_ARG bold_v end_ARG ) = 1 - divide start_ARG roman_det ( Cov ( under¯ start_ARG bold_e end_ARG ) ) end_ARG start_ARG roman_det ( Cov ( under¯ start_ARG bold_u end_ARG ) ) end_ARG = divide start_ARG roman_det ( Cov ( under¯ start_ARG bold_u end_ARG ) ) - roman_det ( Cov ( under¯ start_ARG bold_e end_ARG ) ) end_ARG start_ARG roman_det ( Cov ( under¯ start_ARG bold_u end_ARG ) ) end_ARG . (15)

In Equation 15, noise is quantified by the error covariance matrices Cov(𝐞¯)Cov¯𝐞\hbox{Cov}(\underline{\mathbf{e}})Cov ( under¯ start_ARG bold_e end_ARG ) and signal is quantified by the total covariance matrix Cov(𝐮¯)Cov¯𝐮\hbox{Cov}(\underline{\mathbf{u}})Cov ( under¯ start_ARG bold_u end_ARG ) minus the error covariance matrix. The matrix determinant, det()\det(\cdot)roman_det ( ⋅ ), is taken to obtain a single-number summary of covariance matrices, which \citeAwilks.1932 referred to as the generalized variance. It can be verified that Equation 15 reduces to the coefficient of determination ϱ2(u¯,𝐯¯)superscriptitalic-ϱ2¯𝑢¯𝐯\varrho^{2}(\underline{u},\underline{\mathbf{v}})italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_u end_ARG , under¯ start_ARG bold_v end_ARG ) when the outcome variable u¯¯𝑢\underline{u}under¯ start_ARG italic_u end_ARG is unidimensional. Other STR measures include Pillai’s trace and Roy’s largest root <e.g.,>mardia&kentbibby.1979. One minus Wilks’ lambda and other STR measures based on multivariate regressions are estimable, normalized, but not symmetric; they are invariant to invertible transformations of explanatory variables and non-vanishing linear transformations of outcome variables.

Numerical Study

We conducted a numerical study on the performance of eight association measures of reliability to illustrate their behavior at the level of the population. We examined (a) how the numerical values of these reliability measures change as functions of test length under a two-dimensional simple-structure IRT model, and (b) how they map onto other benchmarks of measurement error (e.g., estimation error of latent scores and inter-LV correlations).

Data Generation

ηi1subscript𝜂𝑖1\eta_{i1}italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT\cdotsyi2subscript𝑦𝑖2y_{i2}italic_y start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPTyi1subscript𝑦𝑖1y_{i1}italic_y start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPTyi,m2subscript𝑦𝑖𝑚2y_{i,\frac{m}{2}}italic_y start_POSTSUBSCRIPT italic_i , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG end_POSTSUBSCRIPTηi2subscript𝜂𝑖2\eta_{i2}italic_η start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT\cdotsyi,m2+2subscript𝑦𝑖𝑚22y_{i,\frac{m}{2}+2}italic_y start_POSTSUBSCRIPT italic_i , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG + 2 end_POSTSUBSCRIPTyi,m2+1subscript𝑦𝑖𝑚21y_{i,\frac{m}{2}+1}italic_y start_POSTSUBSCRIPT italic_i , divide start_ARG italic_m end_ARG start_ARG 2 end_ARG + 1 end_POSTSUBSCRIPTyimsubscript𝑦𝑖𝑚y_{im}italic_y start_POSTSUBSCRIPT italic_i italic_m end_POSTSUBSCRIPT
Figure 1: Path diagram for the two-dimensional measurement model. ηi1subscript𝜂𝑖1\eta_{i1}italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT and ηi2subscript𝜂𝑖2\eta_{i2}italic_η start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT are latent variables and yi1,,yimsubscript𝑦𝑖1subscript𝑦𝑖𝑚y_{i1},\dots,y_{im}italic_y start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_i italic_m end_POSTSUBSCRIPT are manifest variables.

Figure 1 presents the data generating model in which the total number of MVs m𝑚mitalic_m (i.e., test length) is even such that each LV is indicated by the same number of MVs. The two LVs follow a bivariate normal distribution:

𝜼¯i=(η¯i1,η¯i2)𝒩([00],[10.50.51]).subscript¯𝜼𝑖superscriptsubscript¯𝜂𝑖1subscript¯𝜂𝑖2similar-to𝒩matrix00matrix10.50.51\underline{\boldsymbol{\eta}}_{i}=(\underline{\eta}_{i1},\underline{\eta}_{i2}% )^{\prime}\sim{\cal N}\left(\begin{bmatrix}0\\ 0\end{bmatrix},\begin{bmatrix}1&0.5\\ 0.5&1\end{bmatrix}\right).under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ caligraphic_N ( [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0.5 end_CELL end_ROW start_ROW start_CELL 0.5 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] ) . (16)

Conditional on 𝜼isubscript𝜼𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, every MV is mutually independent of one another (i.e., local independence). Each MV yij{0,1}subscript𝑦𝑖𝑗01y_{ij}\in\{0,1\}italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ { 0 , 1 } and the conditional probability of y¯ij=1subscript¯𝑦𝑖𝑗1\underline{y}_{ij}=1under¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 given 𝜼isubscript𝜼𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT follows a three-parameter logistic model (\citeNPbirnbaum.1968):

{y¯ij=1|𝜼i}=cj+1cj1+exp[aj(ηi,k(j)bj)],conditional-setsubscript¯𝑦𝑖𝑗1subscript𝜼𝑖subscript𝑐𝑗1subscript𝑐𝑗1subscript𝑎𝑗subscript𝜂𝑖𝑘𝑗subscript𝑏𝑗\mathbb{P}\{\underline{y}_{ij}=1|\boldsymbol{\eta}_{i}\}=c_{j}+\frac{1-c_{j}}{% 1+\exp\left[-a_{j}(\eta_{i,k(j)}-b_{j})\right]},blackboard_P { under¯ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = 1 | bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + divide start_ARG 1 - italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG 1 + roman_exp [ - italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_i , italic_k ( italic_j ) end_POSTSUBSCRIPT - italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] end_ARG , (17)

in which ajsubscript𝑎𝑗a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, bjsubscript𝑏𝑗b_{j}italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the discrimination, difficulty and pseudo-guessing parameters, respectively. Furthermore, j=1,,m𝑗1𝑚j=1,\dots,mitalic_j = 1 , … , italic_m indexes the MVs, and k(j)=1𝑘𝑗1k(j)=1italic_k ( italic_j ) = 1 if jm/2𝑗𝑚2j\leq m/2italic_j ≤ italic_m / 2 and 2 otherwise. We varied the test length from m=6𝑚6m=6italic_m = 6 to 120 at increasing intervals of 6. For each level of m𝑚mitalic_m, item parameters were independently drawn from the following distributions: ajUniform(0.5,2)similar-tosubscript𝑎𝑗Uniform0.52a_{j}\sim\hbox{Uniform}(0.5,2)italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ Uniform ( 0.5 , 2 ), bjUniform(2,2)similar-tosubscript𝑏𝑗Uniform22b_{j}\sim\hbox{Uniform}(-2,2)italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ Uniform ( - 2 , 2 ), and cjUniform(0,0.2)similar-tosubscript𝑐𝑗Uniform00.2c_{j}\sim\hbox{Uniform}(0,0.2)italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ Uniform ( 0 , 0.2 ), j=1,,m𝑗1𝑚j=1,\dots,mitalic_j = 1 , … , italic_m. For each unique set of item parameters, we generated 1000 MC samples of LV and MV vectors from which we estimated reliability coefficients and benchmark measures. Reported reliability and benchmark estimates were averaged across 1000 sets of item parameters (i.e., replications), which approximates the corresponding population coefficients.

Scores, Reliability Measures, and Benchmarks

Two pairs of observed and latent scores were considered in the simulation. First, we are interested in estimating the LV score 𝜼i=(ηi1,ηi2)subscript𝜼𝑖superscriptsubscript𝜂𝑖1subscript𝜂𝑖2\boldsymbol{\eta}_{i}=(\eta_{i1},\eta_{i2})^{\prime}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT , italic_η start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by the corresponding EAP score 𝔼(𝜼¯i|𝐲i)𝔼conditionalsubscript¯𝜼𝑖subscript𝐲𝑖\mathbb{E}(\underline{\boldsymbol{\eta}}_{i}|\mathbf{y}_{i})blackboard_E ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). For reliability measures that can handle multivariate scores, we let 𝐬(𝐲i)=𝔼(𝜼¯i|𝐲i)𝐬subscript𝐲𝑖𝔼conditionalsubscript¯𝜼𝑖subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})=\mathbb{E}(\underline{\boldsymbol{\eta}}_{i}|% \mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and 𝝃(𝜼i)=𝜼i𝝃subscript𝜼𝑖subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})=\boldsymbol{\eta}_{i}bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Only the first element of a two-dimensional score vector is considered if the reliability measure only applies to unidimensional scores; i.e., s1(𝐲i)=𝔼(η¯i1|𝐲i)subscript𝑠1subscript𝐲𝑖𝔼conditionalsubscript¯𝜂𝑖1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})=\mathbb{E}(\underline{\eta}_{i1}|\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ξ1(𝜼i)=ηi1subscript𝜉1subscript𝜼𝑖subscript𝜂𝑖1\xi_{1}(\boldsymbol{\eta}_{i})=\eta_{i1}italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT. Second, to illustrate the impact of monotone transformations, we used the same observed scores but transformed the latent scores into their percentile ranks, resulting in 𝐬(𝐲i)=𝔼(𝜼¯i|𝐲i)𝐬subscript𝐲𝑖𝔼conditionalsubscript¯𝜼𝑖subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})=\mathbb{E}(\underline{\boldsymbol{\eta}}_{i}|% \mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and 𝝃(𝜼i)=(100Φ(ηi1),100Φ(ηi2))𝝃subscript𝜼𝑖superscript100Φsubscript𝜂𝑖1100Φsubscript𝜂𝑖2\boldsymbol{\xi}(\boldsymbol{\eta}_{i})=\left(100\Phi(\eta_{i1}),100\Phi(\eta_% {i2})\right)^{\prime}bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( 100 roman_Φ ( italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ) , 100 roman_Φ ( italic_η start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Whenever a unidimensional score is required, we speciffy s1(𝐲i)=𝔼(η¯i1|𝐲i)subscript𝑠1subscript𝐲𝑖𝔼conditionalsubscript¯𝜂𝑖1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})=\mathbb{E}(\underline{\eta}_{i1}|\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = blackboard_E ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ξ1(𝐲i)=100Φ(ηi1)subscript𝜉1subscript𝐲𝑖100Φsubscript𝜂𝑖1\xi_{1}(\mathbf{y}_{i})=100\Phi(\eta_{i1})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 100 roman_Φ ( italic_η start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT ).

Table 1: Summary of various reliability coefficients based on pairs of observed and latent scores, symmetry about the two scores, and invariance under the percentile-rank transform of latent scores. Measure = latent scores as outcome; predict = latent scores as outcome, ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = Coefficient of determination, Corr2superscriptCorr2\hbox{Corr}^{2}Corr start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = squared Pearson correlation, Sigma = sescaled coefficient sigma (Equation 7), T𝑇Titalic_T = coefficient T𝑇Titalic_T (Equation 11), MI = rescaled mutual information, and Wilks = one minus Wilks’ lambda. s1(𝐲i)subscript𝑠1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = unidimensional observed score, ξ1(𝜼i)subscript𝜉1subscript𝜼𝑖\xi_{1}(\boldsymbol{\eta}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = unidimensional latent score, 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}({\mathbf{y}_{i}})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = two-dimensional observed scores, and 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = two-dimensional latent scores.
Coefficient Observed Latent Symmetry Invariance
ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (measure) s1(𝐲i)subscript𝑠1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no yes
ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (predict) 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ξ1(𝜼i)subscript𝜉1subscript𝜼𝑖\xi_{1}(\boldsymbol{\eta}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no no
Corr2superscriptCorr2\hbox{Corr}^{2}Corr start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT s1(𝐲i)subscript𝑠1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ξ1(𝜼i)subscript𝜉1subscript𝜼𝑖\xi_{1}(\boldsymbol{\eta}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) yes no
Sigma s1(𝐲i)subscript𝑠1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ξ1(𝜼i)subscript𝜉1subscript𝜼𝑖\xi_{1}(\boldsymbol{\eta}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) yes yes
T𝑇Titalic_T (measure) s1(𝐲i)subscript𝑠1subscript𝐲𝑖s_{1}(\mathbf{y}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no yes
T𝑇Titalic_T (predict) 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ξ1(𝜼i)subscript𝜉1subscript𝜼𝑖\xi_{1}(\boldsymbol{\eta}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no yes
MI 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) yes yes
Wilks (measure) 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no yes
Wilks (predict) 𝐬(𝐲i)𝐬subscript𝐲𝑖\mathbf{s}(\mathbf{y}_{i})bold_s ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) 𝝃(𝜼i)𝝃subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) no no

Nine reliability association measures were investigated. Table 1 provides a summary of the association measures, observed scores, and latent scores involved in each coefficient, as well as whether or not the coefficient is symmetric and invariant to the percentile-rank transformation of latent scores. When the latent scores are the original LVs (i.e., 𝝃(𝜼i)=𝜼i𝝃subscript𝜼𝑖subscript𝜼𝑖\boldsymbol{\xi}(\boldsymbol{\eta}_{i})=\boldsymbol{\eta}_{i}bold_italic_ξ ( bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), observe that (a) the coefficient of determination for the regression of s1(𝐲¯i)subscript𝑠1subscript¯𝐲𝑖s_{1}(\underline{\mathbf{y}}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) onto 𝝃(𝜼¯i)𝝃subscript¯𝜼𝑖\boldsymbol{\xi}(\underline{\boldsymbol{\eta}}_{i})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) coincides with the CTT reliability of s1(𝐲¯i)subscript𝑠1subscript¯𝐲𝑖s_{1}(\underline{\mathbf{y}}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and that (b) the coefficient of determination for the regression of ξ1(𝜼¯i)subscript𝜉1subscript¯𝜼𝑖\xi_{1}(\underline{\boldsymbol{\eta}}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) onto 𝐬(𝐲¯i)𝐬subscript¯𝐲𝑖\mathbf{s}(\underline{\mathbf{y}}_{i})bold_s ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is identical to the squared correlation between s1(𝐲¯i)subscript𝑠1subscript¯𝐲𝑖s_{1}(\underline{\mathbf{y}}_{i})italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and ξ1(𝜼¯i)subscript𝜉1subscript¯𝜼𝑖\xi_{1}(\underline{\boldsymbol{\eta}}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), which further equals to PRMSE of ξ1(𝜼¯i)subscript𝜉1subscript¯𝜼𝑖\xi_{1}(\underline{\boldsymbol{\eta}}_{i})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

With each set of item parameters, we estimated all the reliability coefficients empirically based on 1000 MC samples using the procedure introduced in \citeAliu.pek&maydeu-olivares.2024. EAP scores were obtained from the package mirt Chalmers (\APACyear2012). For association measures that require fitting nonparametric regression models to simulated data (i.e., ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, T𝑇Titalic_T, and Wilks’ lambda), we applied the default thin-plate spline smoother from the mgcv package Wood (\APACyear2003). Coefficient sigmas were computed using the wolfCOP function in the copBasic package Asquith (\APACyear2023), mutual information measures were estimated using the knn_mi function in the rmi package Michaud (\APACyear2018), and coefficient T𝑇Titalic_Ts were obtained using the codec function in the FOCI package Azadkia \BOthers. (\APACyear2021). R code for this numerical study will be provided as Supplemental Material when the paper is accepted for publication.

Within each replication, two additional benchmark values were computed to reflect the recovery of LV scores 𝜼isubscript𝜼𝑖\boldsymbol{\eta}_{i}bold_italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the inter-LV correlation relative to the sizes of true values. The root relative mean squared error (RRMSE) is defined as

RRMSE=i=11000k=12(𝔼(η¯ik|𝐲i)ηik)2i=11000k=12ηik2,RRMSEsuperscriptsubscript𝑖11000superscriptsubscript𝑘12superscript𝔼conditionalsubscript¯𝜂𝑖𝑘subscript𝐲𝑖subscript𝜂𝑖𝑘2superscriptsubscript𝑖11000superscriptsubscript𝑘12superscriptsubscript𝜂𝑖𝑘2\mathrm{RRMSE}=\sqrt{\frac{\sum_{i=1}^{1000}\sum_{k=1}^{2}\left(\mathbb{E}(% \underline{\eta}_{ik}|\mathbf{y}_{i})-\eta_{ik}\right)^{2}}{\sum_{i=1}^{1000}% \sum_{k=1}^{2}\eta_{ik}^{2}}},roman_RRMSE = square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1000 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_E ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_η start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1000 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG , (18)

in which i𝑖iitalic_i indexes MC draws and k=1,2𝑘12k=1,2italic_k = 1 , 2 indexes the dimensions of LVs. RRMSE measures the overall estimation error of LVs by their EAP scores. The relative absolute error (RAE) reflects how well the correlation between EAP scores approximates the true inter-LV correlation (0.5; see Equation 16):

RAE=|Corr^(𝔼(η¯i1|𝐲i),𝔼(η¯i2|𝐲i))0.5|0.5,\mathrm{RAE}=\frac{|\widehat{\hbox{Corr}}\big{(}\mathbb{E}(\underline{\eta}_{i% 1}|\mathbf{y}_{i}),\mathbb{E}(\underline{\eta}_{i2}|\mathbf{y}_{i})\big{)}-0.5% |}{0.5},roman_RAE = divide start_ARG | over^ start_ARG Corr end_ARG ( blackboard_E ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , blackboard_E ( under¯ start_ARG italic_η end_ARG start_POSTSUBSCRIPT italic_i 2 end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) - 0.5 | end_ARG start_ARG 0.5 end_ARG , (19)

in which Corr^^Corr\widehat{\hbox{Corr}}over^ start_ARG Corr end_ARG denotes the empirical Pearson correlation computed from 1000 MC samples. Values from Equations 18 and 19 are expected to decrease as the test length m𝑚mitalic_m grows because increasing m𝑚mitalic_m is associated with more consistent estimates of EAP scores.

Results

Refer to caption
Figure 2: Two benchmark measures (panel A) and relability measures (panels B and C) as functions of test length. Latent scores are original LVs in panel B, and are percentile ranks of LVs in panel C. RRMSE = root relative mean squared error in latent variable scores, RAE = relative absolute error in inter-latent-variable correlation, measure = observed score as outcome, predict = latent score as outcome, ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = coefficient of determination (varrho2𝑣𝑎𝑟𝑟superscript𝑜2varrho^{2}italic_v italic_a italic_r italic_r italic_h italic_o start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (measure) = CTT reliability and varrho2𝑣𝑎𝑟𝑟superscript𝑜2varrho^{2}italic_v italic_a italic_r italic_r italic_h italic_o start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT(predict = PRMSE), Corr2superscriptCorr2\hbox{Corr}^{2}Corr start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = squared Pearson correlation, sigma = rescaled coefficient sigma (Equation 7), T𝑇Titalic_T = coefficient T𝑇Titalic_T (Equation 11), MI = rescaled mutual information (Equation 10), and Wilks = one minus Wilks’ lambda.

Figure 2 presents a graphical summary of the numerical results. With increasing test length m𝑚mitalic_m, the two benchmark measures of estimation error (PRMSE and RAE) monotonically decrease (see Figure 2A), indicating better recovery of LV scores and inter-LV correlations. Although we place PRMSE and RAE within the same plot in which numerical values fall within the unit interval, these values are not directly comparable because they quantify different aspects of the estimates. The RAE is a scalar-valued measure about the inter-LV correlation (ranging from .08 to .47) and RRMSE is a measure for multiple random quantities (i.e., two-dimensional LVs; ranging from .29 to .79).

In Figure 2B, reliability coefficients increase in value as test length m𝑚mitalic_m increases, indicating that EAP scores become better proxies of LVs. Different association measures are not always comparable even though they have been normalized because different reliability coefficients are defined for potentially different pairs of observed and latent scores while quantifying distinct forms of association (see Table 1). Figure 2B suggests that the nine reliability coefficients cluster into three groups (shown in different colors). Coefficients of determination (corresponding to CTT reliabliity and PRMSE) together with the rescaled coefficient sigma, are very similar in value across all levels of m𝑚mitalic_m (approximately from 0.4 to 0.9). The squared correlation coincides with PRMSE in the population; hence, the estimated the squared correlation and PRMSE exhibit almost identical values in the simulation. CTT reliability is observed to be at least as large as PRMSE, which is a known result (Kim, \APACyear2012, Equation 31). Rescaled sigma lies between CTT reliability and PRMSE when m𝑚mitalic_m is small and becomes the largest among the three reliability indexes when m𝑚mitalic_m is large. The measurement and prediction versions of coefficient T𝑇Titalic_T take on smaller values compared to the coefficients of determination and rescaled sigma. Coefficient T𝑇Titalic_T for the measurement decomposition (ranging from .26 to .73) is slightly larger than the coefficient for the prediction decomposition (ranging from .23 to .73), especially at smaller m𝑚mitalic_m. Finally, the three association measures between the two-dimensional LVs and the two-dimensional EAP scores are the largest in magnitude at all levels of m𝑚mitalic_m (approximately ranging .55 and .99). One minus Wilks’ lambdas for generalized measurement decompositions are uniformly larger than those form generalized prediction decompositions, which are in turn uniformly larger than rescaled mutual information.

Transforming LVs to their percentile ranks leaves most coefficients under investigation intact. However, transforming the LVs changes the squared correlation, coefficient of determination based on the prediction decomposition of ξ1(𝜼¯)subscript𝜉1¯𝜼\xi_{1}(\underline{\boldsymbol{\eta}})italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( under¯ start_ARG bold_italic_η end_ARG ), and one minus Wilks’ lambda based on the generalized prediction decomposition of 𝝃(𝜼¯)𝝃¯𝜼\boldsymbol{\xi}(\underline{\boldsymbol{\eta}})bold_italic_ξ ( under¯ start_ARG bold_italic_η end_ARG ). In Figure 2C, the squared correlation and the prediction ϱ2superscriptitalic-ϱ2\varrho^{2}italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are lower than their values in Figure 2B; moreover, the transformation destroys the equivalence between the two coefficients. The prediction decomposition of one-minus Wilks’ lambdas was observed to slightly decrease because of the LV transformation (see Figure 2B versus 2C).

Summary and Discussion

Reliability is a measure of how closely observed and latent scores align with each other. Based on the regression framework of reliability Liu \BOthers. (\APACyear2024); McDonald (\APACyear2011), which assumes a LV measurement model, we have shown that reliability can be broadly defined as a measure of association between observed and latent scores (Equation 5). This broad definition subsumes popular indices of reliability such as coefficients of determination such as CTT reliability Lord \BBA Novick (\APACyear1968) and PRMSE Haberman \BBA Sinharay (\APACyear2010). Because this broad definition of reliability includes very many reliability indices, we identified and described four desiderata that might aid the analyst in selecting the best reliability coefficient(s) for their research. We consider the desiderata of estimability and normalization essential for interpretation. The desiderata of symmetry and invariance, however, are optional depending on the research context.

From our numerical example, we show that different reliability coefficients can be computed from a single measurement model. In general, values of these association measures of reliability increase as a function of test length. Furthermore, reliability measures of association between multiple outcome and explanatory variables (e.g., mutual information and Wilks’ lambda) tend to have larger values compared to reliability measures of association based on univariate regression (e.g., CTT reliability and PRMSE). Importantly, these values of reliability cannot be compared with one another, despite being normalized onto [0,1]01[0,1][ 0 , 1 ], because they measure qualitatively distinct associations between latent and observed scores.

Our general framework of reliability expands the notion of reliability in the context of a LV measurement model in several ways. First, the analyst is not constrained by the choice of observed score and latent score to include in a regression. Second, the analyst can choose any association measure beyond the coefficient of determination. Third, the analyst might move from a univariate regression model (e.g., CTT reliability and PRMSE) to a multivariate regression model (e.g., one minus Wilk’s lambda). Fourth, reliability coefficients can further be chosen based on symmetry and transformation invariance. Because some reliability coefficients we have described are relatively unfamiliar, future research should study their performance in real-data and simulation settings (e.g., under different LV measurement models). Furthermore, to encourage the application of these novel reliability coefficients by substantive researchers, methodologists would need to develop benchmarks or recommendations on how these distinct measures of reliability might be qualitatively interpreted. It is our hope that this general framework might motivate the development of novel reliability coefficients that are useful to substantive researchers, which have yet to be incorporated in the current work.

References

  • Ali \BBA Silvey (\APACyear1965) \APACinsertmetastarali&silvey.1965{APACrefauthors}Ali, S.\BCBT \BBA Silvey, S.  \APACrefYearMonthDay1965. \BBOQ\APACrefatitleAssociation between random variables and the dispersion of a Radon-Nikodym derivative Association between random variables and the dispersion of a Radon-Nikodym derivative.\BBCQ \APACjournalVolNumPagesJournal of the Royal Statistical Society, Series B271100–107. {APACrefDOI} 10.1111/j.2517-6161.1965.tb00613.x \PrintBackRefs\CurrentBib
  • Anastasi \BBA Urbina (\APACyear1997) \APACinsertmetastaranastasi&urbina.1997{APACrefauthors}Anastasi, A.\BCBT \BBA Urbina, S.  \APACrefYear1997. \APACrefbtitlePsychological Testing Psychological testing. \APACaddressPublisherPrentice Hall. \PrintBackRefs\CurrentBib
  • Asquith (\APACyear2023) \APACinsertmetastarasquith.2023{APACrefauthors}Asquith, W\BPBIH.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitlecopBasic—General Bivariate Copula Theory and Many Utility Functions copBasic—general bivariate copula theory and many utility functions\BBCQ [\bibcomputersoftwaremanual]. \APACrefnoteR package version 2.2.2 \PrintBackRefs\CurrentBib
  • Azadkia \BBA Chatterjee (\APACyear2021) \APACinsertmetastarazadkia&chatterjee.2021{APACrefauthors}Azadkia, M.\BCBT \BBA Chatterjee, S.  \APACrefYearMonthDay2021. \BBOQ\APACrefatitleA simple measure of conditional dependence A simple measure of conditional dependence.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics4963070–3102. {APACrefDOI} 10.1214/21-aos2073 \PrintBackRefs\CurrentBib
  • Azadkia \BOthers. (\APACyear2021) \APACinsertmetastarazadkia&chatterjeematloff.2021{APACrefauthors}Azadkia, M., Chatterjee, S.\BCBL \BBA Matloff, N.  \APACrefYearMonthDay2021. \BBOQ\APACrefatitleFOCI: Feature Ordering by Conditional Independence FOCI: Feature ordering by conditional independence\BBCQ [\bibcomputersoftwaremanual]. {APACrefURL} https://CRAN.R-project.org/package=FOCI \APACrefnoteR package version 0.1.3 \PrintBackRefs\CurrentBib
  • Bekker \BOthers. (\APACyear2014) \APACinsertmetastarbekker.merckens&wansbeek.2014{APACrefauthors}Bekker, P\BPBIA., Merckens, A.\BCBL \BBA Wansbeek, T\BPBIJ.  \APACrefYear2014. \APACrefbtitleIdentification, equivalent models, and computer algebra: Statistical modeling and decision science Identification, equivalent models, and computer algebra: Statistical modeling and decision science. \APACaddressPublisherAcademic Press. \PrintBackRefs\CurrentBib
  • Bickel \BBA Doksum (\APACyear2015) \APACinsertmetastarbickel&doksom.2015{APACrefauthors}Bickel, P\BPBIJ.\BCBT \BBA Doksum, K\BPBIA.  \APACrefYear2015. \APACrefbtitleMathematical statistics: Basic ideas and selected topics Mathematical statistics: Basic ideas and selected topics. \APACaddressPublisherCRC Press. \PrintBackRefs\CurrentBib
  • Birnbaum (\APACyear1968) \APACinsertmetastarbirnbaum.1968{APACrefauthors}Birnbaum, A.  \APACrefYearMonthDay1968. \BBOQ\APACrefatitleSome latent train models and their use in inferring an examinee’s ability Some latent train models and their use in inferring an examinee’s ability.\BBCQ \BIn F\BPBIM. Lord \BBA M\BPBIR. Novick (\BEDS), \APACrefbtitleStatistical theories of mental test scores Statistical theories of mental test scores (\BPGS 395–479). \APACaddressPublisherReading, MAAddison-Wesley. \PrintBackRefs\CurrentBib
  • Blum \BOthers. (\APACyear1961) \APACinsertmetastarblum&kieferrosenblatt.1961{APACrefauthors}Blum, J\BPBIR., Kiefer, J.\BCBL \BBA Rosenblatt, M.  \APACrefYear1961. \APACrefbtitleDistribution free tests of independence based on the sample distribution function Distribution free tests of independence based on the sample distribution function. \APACaddressPublisherSandia Corporation. \PrintBackRefs\CurrentBib
  • Bollen (\APACyear1989) \APACinsertmetastarbollen.1989{APACrefauthors}Bollen, K\BPBIA.  \APACrefYear1989. \APACrefbtitleStructural equations with latent variables Structural equations with latent variables. \APACaddressPublisherJohn Wiley & Sons. \PrintBackRefs\CurrentBib
  • Casella \BBA Berger (\APACyear2002) \APACinsertmetastarcasella&berger.2002{APACrefauthors}Casella, G.\BCBT \BBA Berger, R\BPBIL.  \APACrefYear2002. \APACrefbtitleStatistical inference Statistical inference (\PrintOrdinal2nd \BEd). \APACaddressPublisherPacific Grove, CADuxbury. \PrintBackRefs\CurrentBib
  • Chalmers (\APACyear2012) \APACinsertmetastarchalmers.2012{APACrefauthors}Chalmers, R\BPBIP.  \APACrefYearMonthDay2012. \BBOQ\APACrefatitlemirt: A multidimensional item response theory package for the R environment mirt: A multidimensional item response theory package for the R environment.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software4861–29. {APACrefDOI} 10.18637/jss.v048.i06 \PrintBackRefs\CurrentBib
  • Chatterjee (\APACyear2021) \APACinsertmetastarchatterjee.2021{APACrefauthors}Chatterjee, S.  \APACrefYearMonthDay2021. \BBOQ\APACrefatitleA new coefficient of correlation A new coefficient of correlation.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Association1165362009–2022. {APACrefDOI} 10.1080/01621459.2020.1758115 \PrintBackRefs\CurrentBib
  • Chen \BOthers. (\APACyear2018) \APACinsertmetastarchen&liuxu.2018{APACrefauthors}Chen, Y., Liu, Y.\BCBL \BBA Xu, S.  \APACrefYearMonthDay2018. \BBOQ\APACrefatitleMutual information reliability for latent class analysis Mutual information reliability for latent class analysis.\BBCQ \APACjournalVolNumPagesApplied Psychological Measurement426460–477. {APACrefDOI} 10.1177/0146621617748324 \PrintBackRefs\CurrentBib
  • Cole \BBA Preacher (\APACyear2014) \APACinsertmetastarcole&preacher.2014{APACrefauthors}Cole, D\BPBIA.\BCBT \BBA Preacher, K\BPBIJ.  \APACrefYearMonthDay2014. \BBOQ\APACrefatitleManifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error. Manifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error.\BBCQ \APACjournalVolNumPagesPsychological Methods192300–315. {APACrefDOI} 10.1037/a0033805 \PrintBackRefs\CurrentBib
  • Cronbach \BBA Gleser (\APACyear1964) \APACinsertmetastarcronbach&gleser.1964{APACrefauthors}Cronbach, L\BPBIJ.\BCBT \BBA Gleser, G\BPBIC.  \APACrefYearMonthDay1964. \BBOQ\APACrefatitleThe signal/noise ratio in the comparison of reliability coefficients The signal/noise ratio in the comparison of reliability coefficients.\BBCQ \APACjournalVolNumPagesEducational and Psychological Measurement243467–480. {APACrefDOI} 10.1177/0013164464024003 \PrintBackRefs\CurrentBib
  • De Boeck \BOthers. (\APACyear2023) \APACinsertmetastardeboeck.et.al.2023a{APACrefauthors}De Boeck, P., Pek, J., Walton, K\BPBIM., Wegener, D\BPBIT., Turner, B\BPBIM., Andeson, B\BPBIA.\BDBLPetty, R\BPBIE.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleQuestioning psychological constructs: Current issues and proposed changes Questioning psychological constructs: Current issues and proposed changes.\BBCQ \APACjournalVolNumPagesPsychological Inquiry344291–297. {APACrefDOI} 10.1080/1047840X.2023.2281023 \PrintBackRefs\CurrentBib
  • Dette \BOthers. (\APACyear2013) \APACinsertmetastardette&siburgstoimenov.2013{APACrefauthors}Dette, H., Siburg, K\BPBIF.\BCBL \BBA Stoimenov, P\BPBIA.  \APACrefYearMonthDay2013. \BBOQ\APACrefatitleA Copula-Based Non-parametric Measure of Regression Dependence A copula-based non-parametric measure of regression dependence.\BBCQ \APACjournalVolNumPagesScandinavian Journal of Statistics40121–41. {APACrefDOI} 10.1111/j.1467-9469.2011.00767.x \PrintBackRefs\CurrentBib
  • DeVellis \BBA Thorpe (\APACyear2021) \APACinsertmetastardevellis&thorpe.2021{APACrefauthors}DeVellis, R.\BCBT \BBA Thorpe, C.  \APACrefYear2021. \APACrefbtitleScale Development: Theory and Applications Scale development: Theory and applications. \APACaddressPublisherSAGE Publications. \PrintBackRefs\CurrentBib
  • Efron \BBA Tibshirani (\APACyear1993) \APACinsertmetastarefron&tibshirani.1993{APACrefauthors}Efron, B.\BCBT \BBA Tibshirani, R.  \APACrefYear1993. \APACrefbtitleAn introduction to the bootstrap An introduction to the bootstrap. \APACaddressPublisherNew York, NYChapman & Hall. \PrintBackRefs\CurrentBib
  • Fox (\APACyear2015) \APACinsertmetastarfox.2015{APACrefauthors}Fox, J.  \APACrefYear2015. \APACrefbtitleApplied Regression Analysis and Generalized Linear Models Applied regression analysis and generalized linear models. \APACaddressPublisherSAGE Publications. \PrintBackRefs\CurrentBib
  • Gebelein (\APACyear1941) \APACinsertmetastargebelein.1941{APACrefauthors}Gebelein, H.  \APACrefYearMonthDay1941. \BBOQ\APACrefatitleDas statistische Problem der Korrelation als Variations-und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung Das statistische problem der korrelation als variations-und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung.\BBCQ \APACjournalVolNumPagesZeitschrift für Angewandte Mathematik und Mechanik216364–379. {APACrefDOI} 10.1002/zamm.19410210604 \PrintBackRefs\CurrentBib
  • Geenens \BBA Lafaye de Micheaux (\APACyear2022) \APACinsertmetastargeenens&ldm.2022{APACrefauthors}Geenens, G.\BCBT \BBA Lafaye de Micheaux, P.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleThe Hellinger correlation The Hellinger correlation.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Association117538639–653. {APACrefDOI} 10.1080/01621459.2020.1791132 \PrintBackRefs\CurrentBib
  • Haberman \BBA Sinharay (\APACyear2010) \APACinsertmetastarhaberman&sinharay.2010{APACrefauthors}Haberman, S\BPBIJ.\BCBT \BBA Sinharay, S.  \APACrefYearMonthDay2010. \BBOQ\APACrefatitleReporting of subscores using multidimensional item response theory Reporting of subscores using multidimensional item response theory.\BBCQ \APACjournalVolNumPagesPsychometrika75209–227. {APACrefDOI} 10.1007/s11336-010-9158-4 \PrintBackRefs\CurrentBib
  • Hoeffding (\APACyear1948) \APACinsertmetastarhoeffding.1948{APACrefauthors}Hoeffding, W.  \APACrefYearMonthDay1948. \BBOQ\APACrefatitleA non-parametric test of independence A non-parametric test of independence.\BBCQ \APACjournalVolNumPagesThe Annals of Mathematical Statistics194214–226. {APACrefDOI} 10.1214/aoms/1177730150 \PrintBackRefs\CurrentBib
  • Hoyle \BOthers. (\APACyear2024) \APACinsertmetastarhoyle.borsboom&tay.2024{APACrefauthors}Hoyle, R\BPBIH., Borsboom, D.\BCBL \BBA Tay, L.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleMeasuring constructs Measuring constructs.\BBCQ \BIn D\BPBIT. Gilbert, S\BPBIT. Fiske, E\BPBIJ. Finkel\BCBL \BBA W\BPBIB. Mendes (\BEDS), \APACrefbtitleThe Handbook of Social Psychology The handbook of social psychology (\PrintOrdinal6th \BEd). \APACaddressPublisherSituational Press. \PrintBackRefs\CurrentBib
  • Joe (\APACyear1989) \APACinsertmetastarjoe.1989{APACrefauthors}Joe, H.  \APACrefYearMonthDay1989. \BBOQ\APACrefatitleRelative entropy measures of multivariate dependence Relative entropy measures of multivariate dependence.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Association84405157–164. {APACrefDOI} 10.2307/2289859 \PrintBackRefs\CurrentBib
  • Johnson \BBA Sinharay (\APACyear2020) \APACinsertmetastarjohnson&sinharay.2020{APACrefauthors}Johnson, M\BPBIS.\BCBT \BBA Sinharay, S.  \APACrefYearMonthDay2020. \BBOQ\APACrefatitleThe reliability of the posterior probability of skill attainment in diagnostic classification models The reliability of the posterior probability of skill attainment in diagnostic classification models.\BBCQ \APACjournalVolNumPagesJournal of Educational and Behavioral Statistics4515–31. {APACrefDOI} 10.3102/1076998619864550 \PrintBackRefs\CurrentBib
  • Kim (\APACyear2012) \APACinsertmetastarkim.2012{APACrefauthors}Kim, S.  \APACrefYearMonthDay2012. \BBOQ\APACrefatitleA note on the reliability coefficients for item response model-based ability estimates A note on the reliability coefficients for item response model-based ability estimates.\BBCQ \APACjournalVolNumPagesPsychometrika771153–162. {APACrefDOI} /10.1007/s11336-011-9238-0 \PrintBackRefs\CurrentBib
  • Kruskal (\APACyear1958) \APACinsertmetastarkruskal.1958{APACrefauthors}Kruskal, W\BPBIH.  \APACrefYearMonthDay1958. \BBOQ\APACrefatitleOrdinal measures of association Ordinal measures of association.\BBCQ \APACjournalVolNumPagesJournal of the American Statistical Association53284814–861. {APACrefDOI} 10.2307/2281954 \PrintBackRefs\CurrentBib
  • Linfoot (\APACyear1957) \APACinsertmetastarlinfoot.1957{APACrefauthors}Linfoot, E\BPBIH.  \APACrefYearMonthDay1957. \BBOQ\APACrefatitleAn informational measure of correlation An informational measure of correlation.\BBCQ \APACjournalVolNumPagesInformation and Control1185–89. {APACrefDOI} 10.1016/s0019-9958(57)90116-x \PrintBackRefs\CurrentBib
  • Liu \BBA Pek (\APACyear\BIP) \APACinsertmetastarliu&pek.inpress{APACrefauthors}Liu, Y.\BCBT \BBA Pek, J.  \APACrefYearMonthDay\BIP. \BBOQ\APACrefatitleSummed versus estimated factor scores: Considering uncertainties when using observed scores Summed versus estimated factor scores: Considering uncertainties when using observed scores.\BBCQ \APACjournalVolNumPagesPsychological Methods. {APACrefDOI} 10.1037/met0000644 \PrintBackRefs\CurrentBib
  • Liu \BOthers. (\APACyear2024) \APACinsertmetastarliu.pek&maydeu-olivares.2024{APACrefauthors}Liu, Y., Pek, J.\BCBL \BBA Maydeu-Olivares, A.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleUnderstanding reliability from a regression perspective Understanding reliability from a regression perspective.\BBCQ {APACrefURL} https://arxiv.longhoe.net/abs/2404.16709 \PrintBackRefs\CurrentBib
  • Lord \BBA Novick (\APACyear1968) \APACinsertmetastarlord&novick.1968{APACrefauthors}Lord, F\BPBIM.\BCBT \BBA Novick, M\BPBIR.  \APACrefYear1968. \APACrefbtitleStatistical theories of mental test scores Statistical theories of mental test scores. \APACaddressPublisherAddison-Wesley. \PrintBackRefs\CurrentBib
  • Mardia \BOthers. (\APACyear1979) \APACinsertmetastarmardia&kentbibby.1979{APACrefauthors}Mardia, K., Kent, J.\BCBL \BBA Bibby, J.  \APACrefYear1979. \APACrefbtitleMultivariate Analysis Multivariate analysis. \APACaddressPublisherAcademic Press. \PrintBackRefs\CurrentBib
  • Markon (\APACyear2013) \APACinsertmetastarmarkon.2013{APACrefauthors}Markon, K\BPBIE.  \APACrefYearMonthDay2013. \BBOQ\APACrefatitleInformation utility: Quantifying the total psychometric information provided by a measure. Information utility: Quantifying the total psychometric information provided by a measure.\BBCQ \APACjournalVolNumPagesPsychological Methods18115–35. {APACrefDOI} 10.1037/a0030638 \PrintBackRefs\CurrentBib
  • Markon (\APACyear2023) \APACinsertmetastarmarkon.2023{APACrefauthors}Markon, K\BPBIE.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleReliability as Lindley Information Reliability as Lindley information.\BBCQ \APACjournalVolNumPagesMultivariate Behavioral Research584815–842. {APACrefDOI} 10.1080/00273171.2022.2136613 \PrintBackRefs\CurrentBib
  • McDonald (\APACyear2011) \APACinsertmetastarmcdonald.2011{APACrefauthors}McDonald, R\BPBIP.  \APACrefYearMonthDay2011. \BBOQ\APACrefatitleMeasuring latent quantities Measuring latent quantities.\BBCQ \APACjournalVolNumPagesPsychometrika764511–536. {APACrefDOI} 10.1007/s11336-011-9223-7 \PrintBackRefs\CurrentBib
  • Michaud (\APACyear2018) \APACinsertmetastarmichaud.2018{APACrefauthors}Michaud, I.  \APACrefYearMonthDay2018. \BBOQ\APACrefatitlermi: Mutual Information Estimators rmi: Mutual information estimators\BBCQ [\bibcomputersoftwaremanual]. {APACrefURL} https://CRAN.R-project.org/package=rmi \APACrefnoteR package version 0.1.1 \PrintBackRefs\CurrentBib
  • Nelsen (\APACyear2006) \APACinsertmetastarnelsen.2006{APACrefauthors}Nelsen, R.  \APACrefYear2006. \APACrefbtitleAn Introduction to Copulas An introduction to copulas. \APACaddressPublisherSpringer. \PrintBackRefs\CurrentBib
  • Pillai (\APACyear1955) \APACinsertmetastarpillai.1955{APACrefauthors}Pillai, K\BPBIS.  \APACrefYearMonthDay1955. \BBOQ\APACrefatitleSome new test criteria in multivariate analysis Some new test criteria in multivariate analysis.\BBCQ \APACjournalVolNumPagesThe Annals of Mathematical Statistics261117–121. {APACrefDOI} 10.1214/aoms/1177728599 \PrintBackRefs\CurrentBib
  • Raykov \BBA Marcoulides (\APACyear2011) \APACinsertmetastarraykov&marcoulides.2011{APACrefauthors}Raykov, T.\BCBT \BBA Marcoulides, G\BPBIA.  \APACrefYear2011. \APACrefbtitleIntroduction to Psychometric Theory Introduction to psychometric theory. \APACaddressPublisherRoutledge. \PrintBackRefs\CurrentBib
  • Rényi (\APACyear1959) \APACinsertmetastarrenyi.1959{APACrefauthors}Rényi, A.  \APACrefYearMonthDay1959. \BBOQ\APACrefatitleOn measures of dependence On measures of dependence.\BBCQ \APACjournalVolNumPagesActa Mathematica Hungarica103-4441–451. {APACrefDOI} 10.1007/BF02024507 \PrintBackRefs\CurrentBib
  • Schweizer \BBA Wolff (\APACyear1981) \APACinsertmetastarschweizer&wolff.1981{APACrefauthors}Schweizer, B.\BCBT \BBA Wolff, E\BPBIF.  \APACrefYearMonthDay1981. \BBOQ\APACrefatitleOn nonparametric measures of dependence for random variables On nonparametric measures of dependence for random variables.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics94879–885. {APACrefDOI} 10.1214/aos/1176345528 \PrintBackRefs\CurrentBib
  • Sinharay \BBA Johnson (\APACyear2019) \APACinsertmetastarsinharay&johnson.2019{APACrefauthors}Sinharay, S.\BCBT \BBA Johnson, M\BPBIS.  \APACrefYearMonthDay2019. \BBOQ\APACrefatitleMeasures of Agreement: Reliability, Classification Accuracy, and Classification Consistency Measures of agreement: Reliability, classification accuracy, and classification consistency.\BBCQ \BIn M. von Davier \BBA Y\BHBIS. Lee (\BEDS), \APACrefbtitleHandbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages Handbook of diagnostic classification models: Models and model extensions, applications, software packages (\BPGS 359–377). \APACaddressPublisherSpringer. {APACrefDOI} 10.1007/978-3-030-05584-4_17 \PrintBackRefs\CurrentBib
  • Székely \BOthers. (\APACyear2007) \APACinsertmetastarszekely&rizzobakirov.2007{APACrefauthors}Székely, G\BPBIJ., Rizzo, M\BPBIL.\BCBL \BBA Bakirov, N\BPBIK.  \APACrefYearMonthDay2007. \BBOQ\APACrefatitleMeasuring and testing dependence by correlation of distances Measuring and testing dependence by correlation of distances.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics3562769 – 2794. {APACrefDOI} 10.1214/009053607000000505 \PrintBackRefs\CurrentBib
  • Thissen \BBA Steinberg (\APACyear2009) \APACinsertmetastarthissen&steinberg.2009{APACrefauthors}Thissen, D.\BCBT \BBA Steinberg, L.  \APACrefYearMonthDay2009. \BBOQ\APACrefatitleItem Response Theory Item response theory.\BBCQ \BIn R. Millsap \BBA A. Maydeu-Olivares (\BEDS), \APACrefbtitleThe Sage Handbook of Quantitative Methods in Psychology The sage handbook of quantitative methods in psychology (\BPGS 148–177). \APACaddressPublisherLondonSage Publications. \PrintBackRefs\CurrentBib
  • Tjøstheim \BOthers. (\APACyear2022) \APACinsertmetastartjostheim&otneimstove.2022{APACrefauthors}Tjøstheim, D., Otneim, H.\BCBL \BBA Støve, B.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleStatistical dependence: Beyond Pearson’s ρ𝜌\rhoitalic_ρ Statistical dependence: Beyond pearson’s ρ𝜌\rhoitalic_ρ.\BBCQ \APACjournalVolNumPagesStatistical Science37190–109. {APACrefDOI} 10.1214/21-sts823 \PrintBackRefs\CurrentBib
  • van der Vaart (\APACyear1998) \APACinsertmetastarvandervaart.1998{APACrefauthors}van der Vaart, A\BPBIW.  \APACrefYear1998. \APACrefbtitleAsymptotic statistics Asymptotic statistics. \APACaddressPublisherCambridge University Press. \PrintBackRefs\CurrentBib
  • Weiss (\APACyear1982) \APACinsertmetastarweiss.1982{APACrefauthors}Weiss, D\BPBIJ.  \APACrefYearMonthDay1982. \BBOQ\APACrefatitleImproving measurement quality and efficiency with adaptive testing Improving measurement quality and efficiency with adaptive testing.\BBCQ \APACjournalVolNumPagesApplied Psychological Measurement64473–492. {APACrefDOI} 10.1177/014662168200600408 \PrintBackRefs\CurrentBib
  • Wilks (\APACyear1932) \APACinsertmetastarwilks.1932{APACrefauthors}Wilks, S\BPBIS.  \APACrefYearMonthDay1932. \BBOQ\APACrefatitleCertain generalizations in the analysis of variance Certain generalizations in the analysis of variance.\BBCQ \APACjournalVolNumPagesBiometrika243/4471–494. {APACrefDOI} 10.2307/2331979 \PrintBackRefs\CurrentBib
  • Wood (\APACyear2003) \APACinsertmetastarwood.2003{APACrefauthors}Wood, S\BPBIN.  \APACrefYearMonthDay2003. \BBOQ\APACrefatitleThin plate regression splines Thin plate regression splines.\BBCQ \APACjournalVolNumPagesJournal of the Royal Statistical Society Series B: Statistical Methodology65195–114. \PrintBackRefs\CurrentBib