HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: epic

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: arXiv.org perpetual non-exclusive license
arXiv:2404.02093v1 [stat.ME] 02 Apr 2024

High-dimensional covariance regression with application to co-expression QTL detection

Rakheon Kim
Department of Statistical Science, Baylor University
and
**gfei Zhang
Goizueta Business School, Emory University
Abstract

While covariance matrices have been widely studied in many scientific fields, relatively limited progress has been made on estimating conditional covariances that permits a large covariance matrix to vary with high-dimensional subject-level covariates. In this paper, we present a new sparse multivariate regression framework that models the covariance matrix as a function of subject-level covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can be used to determine if and how gene co-expressions vary with genetic variations. To accommodate high-dimensional responses and covariates, we stipulate a combined sparsity structure that encourages covariates with non-zero effects and edges that are modulated by these covariates to be simultaneously sparse. We approach parameter estimation with a blockwise coordinate descent algorithm, and investigate the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT convergence rate of the estimated parameters. In addition, we propose a computationally efficient debiased inference procedure for uncertainty quantification. The efficacy of the proposed method is demonstrated through numerical experiments and an application to a gene co-expression network study with brain cancer patients.


Keywords: Covariance regression; subject-specific covariance matrix; multivariate linear regression; sparse group lasso; co-expression QTL.

1 Introduction

A covariance matrix measures the associations amongst a set of variables and its estimation and analysis play an important role in a wide range of applications, such as genetics (Butte et al., 2000; Su et al., 2023), neuroscience (Zhang et al., 2020, 2023), finance (El Karoui et al., 2010; Xue et al., 2012) and climatology (Bickel et al., 2008a). For example, in genetics, the covariance matrix estimated from gene expressions across different biological samples, often referred to as a co-expression network, is routinely used in identifying functional gene modules and dysregulated pathways in disease (Langfelder and Horvath, 2008; Su et al., 2023). Although most co-expression analyses to date assume a common covariance matrix for different subjects, the structure and degree of covariance may depend on individual’s characteristics such as age, sex and genotype, which are referred to as individual-level covariates or covariates in this paper when there is no ambiguity. For example, it is known that co-expressions among genes can be affected by individual genetic variants, clinical and environmental factors (Van Der Wijst et al., 2018). In particular, a genetic variant that affects co-expressions between a pair of genes is termed a co-expression quantitative trait loci (QTL). Identifying co-expression QTLs is of great scientific interests and can be crucial in develo** gene therapies that target specific gene or pathway disruptions (Van Der Wijst et al., 2018; Zhang and Zhao, 2023).

Although the literature on estimating large covariance matrices is steadily increasing (Wu and Pourahmadi, 2003; Huang et al., 2006; Bickel et al., 2008a, b; Rothman et al., 2009; Lam and Fan, 2009; Bien and Tibshirani, 2011, and others), the majority of existing methods assume a homogeneous population obeying a common covariance model. Some others have considered modeling covariate-dependent covariance matrices. For example, Anderson (1973) modeled the covariance matrix 𝚺p×p𝚺superscript𝑝𝑝\mbox{\boldmath$\Sigma$}\in\mathbb{R}^{p\times p}bold_Σ ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p end_POSTSUPERSCRIPT as a linear combination of a given set of symmetric matrices; Chiu et al. (1996) modeled elements in the logarithm of 𝚺𝚺\Sigmabold_Σ, denoted as log𝚺𝚺\log\mbox{\boldmath$\Sigma$}roman_log bold_Σ, as a linear function of covariates 𝐱q𝐱superscript𝑞{\mathbf{x}}\in\mathbb{R}^{q}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT. As noted by the authors, parameter interpretation for this model can be difficult, as a submatrix of 𝚺𝚺\Sigmabold_Σ is not generally the matrix exponential of the same submatrix of log𝚺𝚺\log\mbox{\boldmath$\Sigma$}roman_log bold_Σ, and so the entries in log𝚺𝚺\log\mbox{\boldmath$\Sigma$}roman_log bold_Σ do not directly relate to the corresponding entries in 𝚺𝚺\Sigmabold_Σ. Pourahmadi (1999) modeled elements of the Cholesky decomposition of 𝚺1superscript𝚺1\mbox{\boldmath$\Sigma$}^{-1}bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT as linear functions of 𝐱𝐱{\mathbf{x}}bold_x, though this model is not invariant to the reorderings of response variables. Zou et al. (2017) related 𝚺𝚺\Sigmabold_Σ to a linear combination of similarity matrices of covariates. However, the covariates considered in this work are variable-specific and not individual-specific (e.g., covariates of genes but not of individuals). As such, the estimated covariance could not account for individual-level heterogeneity due to clinical covariates and genotypes.

Notably, Hoff and Niu (2012) proposed to model 𝚺𝚺\Sigmabold_Σ as a quadratic function of covariates 𝐱𝐱{\mathbf{x}}bold_x written as 𝐁𝐱𝐱𝐁superscript𝐁𝐱𝐱topsuperscript𝐁top{\mathbf{B}}{\mathbf{x}}{\mathbf{x}}^{\top}{\mathbf{B}}^{\top}bold_Bxx start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, 𝐁p×q𝐁superscript𝑝𝑞{\mathbf{B}}\in\mathbb{R}^{p\times q}bold_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_q end_POSTSUPERSCRIPT, which also admits a nice random-effects model representation; model estimation is carried out using the expectation–maximization (EM) algorithm or an Markov chain Monte Carlo (MCMC) via Gibbs sampling. Their modeling framework is further extended in Fox and Dunson (2015) by considering non-linear effects, in Franks (2021) by considering high dimensional response variables and in Alakus et al. (2022) by considering random forests. The computational costs of the above extended methods can be prohibitive when dimensions of the response variables and covariates are both high. Moreover, due to the quadratic form of 𝐁𝐱𝐱𝐁superscript𝐁𝐱𝐱topsuperscript𝐁top{\mathbf{B}}{\mathbf{x}}{\mathbf{x}}^{\top}{\mathbf{B}}^{\top}bold_Bxx start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_B start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, sparsity in parameter 𝐁𝐁{\mathbf{B}}bold_B does not directly translate to sparse effects of covariates, possibly limiting model interpretability. Zhao et al. (2021); Park (2023) studied a principal regression approach that models 𝜸𝚺𝜸superscript𝜸top𝚺𝜸\bm{\gamma}^{\top}\mbox{\boldmath$\Sigma$}\bm{\gamma}bold_italic_γ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ bold_italic_γ, where 𝜸𝜸\bm{\gamma}bold_italic_γ is an unknown rotation vector, as a generalized linear model of 𝐱𝐱{\mathbf{x}}bold_x. Parameter interpretation for this model may not be straightforward, as elements in 𝚺𝚺\Sigmabold_Σ are not directly modeled as a function of 𝐱𝐱{\mathbf{x}}bold_x. It is also challenging to further extend this approach to the high dimensional setting.

To flexibly model large covariance matrices modulated by individual-level covariates, we propose a covariance regression model that allows the structure and degree of covariance to vary with discrete and continuous covariates of high dimensions. Specifically, the covariance matrix is modeled as a linear function of covariates with matrix-valued coefficients, subject to constraints that ensure positive semi-definiteness. Our model needs not to make specific assumptions on the distribution of response variables, such as the Gaussian assumption imposed in Hoff and Niu (2012). Using method of moments, we formulate coefficient estimation as a multivariate linear regression problem and impose a sparse group lasso penalty that simultaneously encourages effective covariates and their effects on the covariance matrix to be sparse. This combined sparsity assumption facilitates model estimability and interpretability, and is closely connected with multi-tasking learning (Argyriou et al., 2008). However, it brings substantial challenges to our theoretical analysis. In theory, we investigate the convergence rate of the proposed estimator, allowing both the response variables and covariates to be high-dimensional. Under our modeling framework, we further formulate a debiased inferential procedure inspired by the recent literature on debiasing lasso (Javanmard and Montanari, 2014; Zhang and Zhang, 2014; Cai et al., 2022) that can also handle non-Gaussian and heteroskedastic errors. We show that the q+1𝑞1q+1italic_q + 1 coefficient matrices associated with q𝑞qitalic_q covariates (plus intercept) can be debiased separately, a result that significantly reduces the computational cost. Although motivated by a biological application, our method provides a general framework for modeling covariance matrices with covariates and is broadly applicable to other scientific fields that involve covariance estimation.

The rest of the paper is organized as follows. Section 2 introduces the covariance regression model and Section 3 discusses the sparse multivariate regression estimation. Section 4 investigates theoretically the convergence rate of the proposed estimator and also proposes a debiased inferential procedure. Section 5 carried out comprehensive simulation studies and Section 6 conducts a co-expression QTL analysis using a brain cancer genomics data set. A short discussion section concludes the paper.

2 Covariance Regression Models

We start with some notation. Write [d]={1,2,,d}delimited-[]𝑑12𝑑[d]=\{1,2,\ldots,d\}[ italic_d ] = { 1 , 2 , … , italic_d }. Given a vector 𝐱=(x1,,xd)𝐱superscriptsubscript𝑥1subscript𝑥𝑑top{\mathbf{x}}=(x_{1},\ldots,x_{d})^{\top}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, we use 𝐱1subscriptnorm𝐱1\|{\mathbf{x}}\|_{1}∥ bold_x ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝐱2subscriptnorm𝐱2\|{\mathbf{x}}\|_{2}∥ bold_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐱subscriptnorm𝐱\|{\mathbf{x}}\|_{\infty}∥ bold_x ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT to denote the vector 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norms, respectively. For a matrix 𝐗d1×d2𝐗superscriptsubscript𝑑1subscript𝑑2{\mathbf{X}}\in\mathbb{R}^{d_{1}\times d_{2}}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we let 𝐗1=ij|Xij|subscriptnorm𝐗1subscript𝑖𝑗subscript𝑋𝑖𝑗\|{\mathbf{X}}\|_{1}=\sum_{ij}|X_{ij}|∥ bold_X ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT |, 𝐗F=(ijXij2)1/2subscriptnorm𝐗𝐹superscriptsubscript𝑖𝑗superscriptsubscript𝑋𝑖𝑗212\|{\mathbf{X}}\|_{F}=(\sum_{ij}X_{ij}^{2})^{1/2}∥ bold_X ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT and 𝐗=maxij|Xij|subscriptnorm𝐗subscript𝑖𝑗subscript𝑋𝑖𝑗\|{\mathbf{X}}\|_{\infty}=\max_{ij}|X_{ij}|∥ bold_X ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | italic_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | denote the matrix element-wise 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm, Frobenius norm and element-wise max norm, respectively, and let vech(𝐗)=(X11,X12,,X1,d1,,Xd1d1)vech𝐗subscript𝑋11subscript𝑋12subscript𝑋1subscript𝑑1subscript𝑋subscript𝑑1subscript𝑑1\text{vech}({\mathbf{X}})=(X_{11},X_{12},\ldots,X_{1,d_{1}},\ldots,X_{d_{1}d_{% 1}})vech ( bold_X ) = ( italic_X start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT 1 , italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , italic_X start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) represent the vectorization of the upper triangular part of 𝐗𝐗{\mathbf{X}}bold_X and vec(𝐗)vec𝐗\text{vec}({\mathbf{X}})vec ( bold_X ) represent the concatenation of columns in 𝐗𝐗{\mathbf{X}}bold_X. We use λmin()subscript𝜆\lambda_{\min}(\cdot)italic_λ start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( ⋅ ) and λmax()subscript𝜆\lambda_{\max}(\cdot)italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( ⋅ ) to denote the smallest and largest eigenvalues of a matrix, respectively.

Given a vector of p𝑝pitalic_p response variables denoted as 𝐲=(y1,,yp)𝐲superscriptsubscript𝑦1subscript𝑦𝑝top{\mathbf{y}}=(y_{1},\ldots,y_{p})^{\top}bold_y = ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, and a vector of q𝑞qitalic_q covariates denoted as 𝐱=(x1,,xq)𝐱superscriptsubscript𝑥1subscript𝑥𝑞top{\mathbf{x}}=(x_{1},\ldots,x_{q})^{\top}bold_x = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT satisfying xh[0,1]subscript𝑥01x_{h}\in[0,1]italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ [ 0 , 1 ] for h[q]delimited-[]𝑞h\in[q]italic_h ∈ [ italic_q ], we assume that 𝔼(𝐲|𝐱)=𝜷0+𝚪𝐱𝔼conditional𝐲𝐱subscript𝜷0𝚪𝐱\mathbb{E}({\mathbf{y}}|{\mathbf{x}})={\bm{\beta}}_{0}+\bm{\Gamma}{\mathbf{x}}blackboard_E ( bold_y | bold_x ) = bold_italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + bold_Γ bold_x, where 𝜷0psubscript𝜷0superscript𝑝{\bm{\beta}}_{0}\in\mathbb{R}^{p}bold_italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT, 𝚪p×q𝚪superscript𝑝𝑞\bm{\Gamma}\in\mathbb{R}^{p\times q}bold_Γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_q end_POSTSUPERSCRIPT, and

Cov(𝐲|𝐱)=𝚺(𝐱)=𝐁0+h=1qxh𝐁h,Covconditional𝐲𝐱𝚺𝐱subscript𝐁0superscriptsubscript1𝑞subscript𝑥subscript𝐁\displaystyle\text{Cov}({\mathbf{y}}|{\mathbf{x}})=\mbox{\boldmath$\Sigma$}({% \mathbf{x}})={\mathbf{B}}_{0}+\sum_{h=1}^{q}x_{h}{\mathbf{B}}_{h},Cov ( bold_y | bold_x ) = bold_Σ ( bold_x ) = bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , (1)

where 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a symmetric and positive definite (PD) matrix of dimension p×p𝑝𝑝p\times pitalic_p × italic_p and 𝐁1,,𝐁qsubscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT are symmetric matrices of dimension p×p𝑝𝑝p\times pitalic_p × italic_p. Here, 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT specifies the covariance at the population level and 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT represents the effect of covariate xhsubscript𝑥x_{h}italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT on the covariance matrix. Let 𝐁h=𝐐h𝚲h𝐐hsubscript𝐁subscript𝐐subscript𝚲superscriptsubscript𝐐top{\mathbf{B}}_{h}={\mathbf{Q}}_{h}{\bm{\Lambda}}_{h}{\mathbf{Q}}_{h}^{\top}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = bold_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT be the eigendecomposition of 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and define p×p𝑝𝑝p\times pitalic_p × italic_p matrices 𝚲h+superscriptsubscript𝚲{\bm{\Lambda}}_{h}^{+}bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT and 𝚲hsuperscriptsubscript𝚲{\bm{\Lambda}}_{h}^{-}bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT such that Λh,jj+=max(0,Λh,jj)superscriptsubscriptΛ𝑗𝑗0subscriptΛ𝑗𝑗\Lambda_{h,jj}^{+}=\max(0,\Lambda_{h,jj})roman_Λ start_POSTSUBSCRIPT italic_h , italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_max ( 0 , roman_Λ start_POSTSUBSCRIPT italic_h , italic_j italic_j end_POSTSUBSCRIPT ) and Λh,jj=min(0,Λh,jj)superscriptsubscriptΛ𝑗𝑗0subscriptΛ𝑗𝑗\Lambda_{h,jj}^{-}=\min(0,\Lambda_{h,jj})roman_Λ start_POSTSUBSCRIPT italic_h , italic_j italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = roman_min ( 0 , roman_Λ start_POSTSUBSCRIPT italic_h , italic_j italic_j end_POSTSUBSCRIPT ), leading to 𝚲h=𝚲h++𝚲hsubscript𝚲superscriptsubscript𝚲superscriptsubscript𝚲{\bm{\Lambda}}_{h}={\bm{\Lambda}}_{h}^{+}+{\bm{\Lambda}}_{h}^{-}bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT. We assume that

λmin(𝐁0+h=1q𝐁h)>0,subscript𝜆minsubscript𝐁0superscriptsubscript1𝑞superscriptsubscript𝐁0\displaystyle\lambda_{\text{min}}({\mathbf{B}}_{0}+\sum_{h=1}^{q}{\mathbf{B}}_% {h}^{-})>0,italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0 , (2)

where 𝐁h=𝐐h𝚲h𝐐hsuperscriptsubscript𝐁subscript𝐐superscriptsubscript𝚲superscriptsubscript𝐐top{\mathbf{B}}_{h}^{-}={\mathbf{Q}}_{h}{\bm{\Lambda}}_{h}^{-}{\mathbf{Q}}_{h}^{\top}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = bold_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT bold_Q start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT. This is a sufficient condition for a PD 𝚺(𝐱)𝚺𝐱\mbox{\boldmath$\Sigma$}({\mathbf{x}})bold_Σ ( bold_x ) as

λmin{𝚺(𝐱)}=λmin(𝐁0+h=1qxh𝐁h)λmin(𝐁0+h=1q𝐁h),subscript𝜆min𝚺𝐱subscript𝜆minsubscript𝐁0superscriptsubscript1𝑞subscript𝑥subscript𝐁subscript𝜆minsubscript𝐁0superscriptsubscript1𝑞superscriptsubscript𝐁\lambda_{\text{min}}\{\mbox{\boldmath$\Sigma$}({\mathbf{x}})\}=\lambda_{\text{% min}}({\mathbf{B}}_{0}+\sum_{h=1}^{q}x_{h}{\mathbf{B}}_{h})\geq\lambda_{\text{% min}}({\mathbf{B}}_{0}+\sum_{h=1}^{q}{\mathbf{B}}_{h}^{-}),italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT { bold_Σ ( bold_x ) } = italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ≥ italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ,

where the last inequality holds by the condition that xh[0,1]subscript𝑥01x_{h}\in[0,1]italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ [ 0 , 1 ]. When p=1𝑝1p=1italic_p = 1, we have Bh=bhsubscript𝐵subscript𝑏B_{h}=b_{h}italic_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and condition (2) reduces to b0+h=1qmin{bh,0}>0subscript𝑏0superscriptsubscript1𝑞subscript𝑏00b_{0}+\sum_{h=1}^{q}\min\{b_{h},0\}>0italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT roman_min { italic_b start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , 0 } > 0, ensuring the variance to be positive regardless of the values of xhsubscript𝑥x_{h}italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT’s. This is a reasonable assumption. For instance, in our motivating data example, subjects with a specific genetic variant mutation may have a lower variance in gene expression than others. In this case, condition (2) implies that these subjects should still have a positive gene expression variance as b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is dominant. We note that requiring the covariates to be in [0,1]01[0,1][ 0 , 1 ] is not restrictive. In our data example, the covariates are genetic variants, which are often coded as {0,1}01\{0,1\}{ 0 , 1 }. If a covariate takes negative values, one can add a shift such that the shifted covariate is nonnegative; if a covariate can be greater than one, one can rescale the covariate such that the scaled covariate is less than one. Such a transformation of xhsubscript𝑥x_{h}italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT does not affect the interpretability of 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT; see details in Section 7. To expose key ideas, we assume 𝜷0subscript𝜷0{\bm{\beta}}_{0}bold_italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝚪𝚪\bm{\Gamma}bold_Γ are known in the ensuing development, and focus on the estimation of 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. Extensions with estimated 𝜷0subscript𝜷0{\bm{\beta}}_{0}bold_italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝚪𝚪\bm{\Gamma}bold_Γ are straightforward, but with more involved notation.

With n𝑛nitalic_n independent observations denoted as {(𝐲i,𝐱i),i[n]}p×qsubscript𝐲𝑖subscript𝐱𝑖𝑖delimited-[]𝑛superscript𝑝superscript𝑞\{({\mathbf{y}}_{i},{\mathbf{x}}_{i}),i\in[n]\}\in\mathbb{R}^{p}\times\mathbb{% R}^{q}{ ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i ∈ [ italic_n ] } ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, we aim to estimate 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT via (1). This is a challenging task, as even in the simple Gaussian case, the log likelihood function is

i=1nlog|𝐁0+h=1qxh𝐁h|i=1ntr{(𝐁0+h=1qxh𝐁h)1𝐳i𝐳i},superscriptsubscript𝑖1𝑛subscript𝐁0superscriptsubscript1𝑞subscript𝑥subscript𝐁superscriptsubscript𝑖1𝑛trsuperscriptsubscript𝐁0superscriptsubscript1𝑞subscript𝑥subscript𝐁1subscript𝐳𝑖superscriptsubscript𝐳𝑖top\sum_{i=1}^{n}\log\left|{\mathbf{B}}_{0}+\sum_{h=1}^{q}x_{h}{\mathbf{B}}_{h}% \right|-\sum_{i=1}^{n}\text{tr}\left\{\left({\mathbf{B}}_{0}+\sum_{h=1}^{q}x_{% h}{\mathbf{B}}_{h}\right)^{-1}{\mathbf{z}}_{i}{\mathbf{z}}_{i}^{\top}\right\},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_log | bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT | - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT tr { ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT } ,

where tr()tr\text{tr}(\cdot)tr ( ⋅ ) denotes the trace of a matrix and 𝐳i=𝐲i𝔼(𝐲i)subscript𝐳𝑖subscript𝐲𝑖𝔼subscript𝐲𝑖{\mathbf{z}}_{i}={\mathbf{y}}_{i}-\mathbb{E}({\mathbf{y}}_{i})bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - blackboard_E ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Due to the sums involved in the matrix trace and inverse calculations, this loglikelihood is not convex or biconvex with respect to 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, and cannot be directly optimized using iterative algorithmic solutions such as the EM and coordinate descent algorithms. To overcome this challenge, we consider a moment-based approach that is highly efficient to implement and need not to make distributional assumptions on 𝐲isubscript𝐲𝑖{\mathbf{y}}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s.

First, note that (1) implies 𝔼(zijzik)=B0,jk+h=1qxiqBq,jk𝔼subscript𝑧𝑖𝑗subscript𝑧𝑖𝑘subscript𝐵0𝑗𝑘superscriptsubscript1𝑞subscript𝑥𝑖𝑞subscript𝐵𝑞𝑗𝑘\mathbb{E}(z_{ij}z_{ik})=B_{0,jk}+\sum_{h=1}^{q}x_{iq}B_{q,jk}blackboard_E ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) = italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i italic_q end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_q , italic_j italic_k end_POSTSUBSCRIPT, that is

zijzik=B0,jk+xi1B1,jk++xiqBq,jk+ϵijk,subscript𝑧𝑖𝑗subscript𝑧𝑖𝑘subscript𝐵0𝑗𝑘subscript𝑥𝑖1subscript𝐵1𝑗𝑘subscript𝑥𝑖𝑞subscript𝐵𝑞𝑗𝑘subscriptitalic-ϵ𝑖𝑗𝑘\displaystyle z_{ij}z_{ik}=B_{0,jk}+x_{i1}B_{1,jk}+\ldots+x_{iq}B_{q,jk}+% \epsilon_{ijk},italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT = italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_i 1 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT 1 , italic_j italic_k end_POSTSUBSCRIPT + … + italic_x start_POSTSUBSCRIPT italic_i italic_q end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_q , italic_j italic_k end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT , (3)

where 𝔼(ϵijk)=0𝔼subscriptitalic-ϵ𝑖𝑗𝑘0\mathbb{E}(\epsilon_{ijk})=0blackboard_E ( italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ) = 0 and Bh,jksubscript𝐵𝑗𝑘B_{h,jk}italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT denotes the (j,k)𝑗𝑘(j,k)( italic_j , italic_k )th entry of 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT. Letting 𝐕𝐕{\mathbf{V}}bold_V be an n×p(p+1)/2𝑛𝑝𝑝12n\times p(p+1)/2italic_n × italic_p ( italic_p + 1 ) / 2 matrix whose elements are {zijzik}i=1,j=1,k=jn,p,psuperscriptsubscriptsubscript𝑧𝑖𝑗subscript𝑧𝑖𝑘formulae-sequence𝑖1formulae-sequence𝑗1𝑘𝑗𝑛𝑝𝑝\{z_{ij}z_{ik}\}_{i=1,j=1,k=j}^{n,p,p}{ italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 , italic_j = 1 , italic_k = italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_p , italic_p end_POSTSUPERSCRIPT and 𝐗={Xil}i=1,l=1n,q+1𝐗superscriptsubscriptsubscript𝑋𝑖𝑙formulae-sequence𝑖1𝑙1𝑛𝑞1{\mathbf{X}}=\{X_{il}\}_{i=1,l=1}^{n,q+1}bold_X = { italic_X start_POSTSUBSCRIPT italic_i italic_l end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 , italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_q + 1 end_POSTSUPERSCRIPT be the n×(q+1)𝑛𝑞1n\times(q+1)italic_n × ( italic_q + 1 ) design matrix (including the intercept term), the observation in (3) facilitates the estimation of 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT via the following multivariate linear regression,

𝐕=𝐗𝐃+𝐄,𝐕𝐗𝐃𝐄\displaystyle{\mathbf{V}}={\mathbf{X}}{\mathbf{D}}+{\mathbf{E}},bold_V = bold_XD + bold_E , (4)

where 𝐃𝐃{\mathbf{D}}bold_D is the (q+1)×p(p+1)/2𝑞1𝑝𝑝12(q+1)\times p(p+1)/2( italic_q + 1 ) × italic_p ( italic_p + 1 ) / 2 coefficient matrix whose elements are {Bh,jk}h=0,j=1,k=jq,p,psuperscriptsubscriptsubscript𝐵𝑗𝑘formulae-sequence0formulae-sequence𝑗1𝑘𝑗𝑞𝑝𝑝\{B_{h,jk}\}_{h=0,j=1,k=j}^{q,p,p}{ italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_h = 0 , italic_j = 1 , italic_k = italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q , italic_p , italic_p end_POSTSUPERSCRIPT and 𝐄𝐄{\mathbf{E}}bold_E is the n×p(p+1)/2𝑛𝑝𝑝12n\times p(p+1)/2italic_n × italic_p ( italic_p + 1 ) / 2 matrix whose elements are {ϵijk}i=1,j=1,k=jn,p,psuperscriptsubscriptsubscriptitalic-ϵ𝑖𝑗𝑘formulae-sequence𝑖1formulae-sequence𝑗1𝑘𝑗𝑛𝑝𝑝\{\epsilon_{ijk}\}_{i=1,j=1,k=j}^{n,p,p}{ italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 , italic_j = 1 , italic_k = italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n , italic_p , italic_p end_POSTSUPERSCRIPT.

When both p𝑝pitalic_p and q𝑞qitalic_q are large, to ensure the estimability and facilitate the interpretability of 𝐃𝐃{\mathbf{D}}bold_D, we impose 𝐃𝐃{\mathbf{D}}bold_D to be sparse. In particular, we assume 𝐃𝐃{\mathbf{D}}bold_D is group sparse, corresponding to sparse effective covariates, that is, only a subset of the covariates may impact edges (termed effective covariates). We further assume 𝐃𝐃{\mathbf{D}}bold_D is element-wise sparse. That is, effective covariates may influence only a subset of the edges. These simultaneous sparsity assumptions are well supported by genetic studies (Gardner et al., 2003; Vierstra et al., 2020), and improve model interpretability when compared to using the group sparsity or element-wise sparsity alone. To encourage simultaneous sparsity, we consider the following penalty

𝒫λ,λg(𝐁0,𝐁1,,𝐁q)=λ(h=1qjk|Bh,jk|+j<k|B0,jk|)+λgh=1qvech(𝐁h)2,subscript𝒫𝜆subscript𝜆𝑔subscript𝐁0subscript𝐁1subscript𝐁𝑞𝜆superscriptsubscript1𝑞subscript𝑗𝑘subscript𝐵𝑗𝑘subscript𝑗𝑘subscript𝐵0𝑗𝑘subscript𝜆𝑔superscriptsubscript1𝑞subscriptnormvechsubscript𝐁2\displaystyle\mathcal{P}_{\lambda,\lambda_{g}}({\mathbf{B}}_{0},{\mathbf{B}}_{% 1},\ldots,{\mathbf{B}}_{q})=\lambda\bigg{(}\sum_{h=1}^{q}\sum_{j\leq k}|B_{h,% jk}|+\sum_{j<k}|B_{0,jk}|\bigg{)}+\lambda_{g}\sum_{h=1}^{q}\|\text{vech}({% \mathbf{B}}_{h})\|_{2},caligraphic_P start_POSTSUBSCRIPT italic_λ , italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) = italic_λ ( ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ≤ italic_k end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT | + ∑ start_POSTSUBSCRIPT italic_j < italic_k end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT | ) + italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∥ vech ( bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (5)

where 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT are as defined in the model (1) and λ,λg𝜆subscript𝜆𝑔\lambda,\lambda_{g}italic_λ , italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT are tuning parameters.

The term h=1qjk|Bh,jk|+j<k|B0,jk|superscriptsubscript1𝑞subscript𝑗𝑘subscript𝐵𝑗𝑘subscript𝑗𝑘subscript𝐵0𝑗𝑘\sum_{h=1}^{q}\sum_{j\leq k}|B_{h,jk}|+\sum_{j<k}|B_{0,jk}|∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j ≤ italic_k end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT | + ∑ start_POSTSUBSCRIPT italic_j < italic_k end_POSTSUBSCRIPT | italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT | is a lasso penalty that encourages the effect of effective covariates to be sparse. We exclude 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from the group sparse penalty (but not the element-wise sparse penalty), as it determines the population level network. Also, the diagonal elements of 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are excluded from element-wise sparse penalty to ensure the response variables have non-zero variances at the population level.

The term h=1qvech(𝐁h)2superscriptsubscript1𝑞subscriptnormvechsubscript𝐁2\sum_{h=1}^{q}\|\text{vech}({\mathbf{B}}_{h})\|_{2}∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ∥ vech ( bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a group lasso penalty (Yuan and Lin, 2006) that encourages the effective covariates to be sparse, achieved by regularizing 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT across p(p+1)/2𝑝𝑝12p(p+1)/2italic_p ( italic_p + 1 ) / 2 regression tasks from (3) simultaneously. Correspondingly, this penalty term facilitates a multi-task learning approach (Argyriou et al., 2008). The penalty term in (5) is similar to the sparse group lasso considered in Simon et al. (2013); Li et al. (2015), though it is not exactly the same as some parameters are included in the element-wise sparsity penalty but not the group sparsity penalty. This adds additional complexity to the estimation procedure and theoretical analysis.

3 Estimation

Given 𝐕𝐕{\mathbf{V}}bold_V, 𝐗𝐗{\mathbf{X}}bold_X and to estimate 𝐃𝐃{\mathbf{D}}bold_D, we consider the following optimization problem,

12n𝐕𝐗𝐃F2+𝒫λ,λg(𝐁0,𝐁1,,𝐁q),12𝑛superscriptsubscriptnorm𝐕𝐗𝐃𝐹2subscript𝒫𝜆subscript𝜆𝑔subscript𝐁0subscript𝐁1subscript𝐁𝑞\displaystyle\frac{1}{2n}\|{\mathbf{V}}-{\mathbf{X}}{\mathbf{D}}\|_{F}^{2}+% \mathcal{P}_{\lambda,\lambda_{g}}({\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{% \mathbf{B}}_{q}),divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∥ bold_V - bold_XD ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_P start_POSTSUBSCRIPT italic_λ , italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) , (6)
s.t.λmin(𝐁0+h=1q𝐁h)>0s.t.subscript𝜆minsubscript𝐁0superscriptsubscript1𝑞superscriptsubscript𝐁0\displaystyle\text{s.t.}\,\,\lambda_{\text{min}}({\mathbf{B}}_{0}+\sum_{h=1}^{% q}{\mathbf{B}}_{h}^{-})>0s.t. italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0

where 𝒫λ,λg()subscript𝒫𝜆subscript𝜆𝑔\mathcal{P}_{\lambda,\lambda_{g}}(\cdot)caligraphic_P start_POSTSUBSCRIPT italic_λ , italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( ⋅ ) is specified as in (5). When there are no covariates, (6) reduces to the standard sparse covariance estimation problem (Rothman et al., 2009), written as

j=1pk=jp{12ni=1n(zijzikB0,jk)2+λ|B0,jk|},superscriptsubscript𝑗1𝑝superscriptsubscript𝑘𝑗𝑝12𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑧𝑖𝑗subscript𝑧𝑖𝑘subscript𝐵0𝑗𝑘2𝜆subscript𝐵0𝑗𝑘\displaystyle\sum_{j=1}^{p}\sum_{k=j}^{p}\bigg{\{}\frac{1}{2n}\sum_{i=1}^{n}(z% _{ij}z_{ik}-B_{0,jk})^{2}+\lambda|B_{0,jk}|\bigg{\}},∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT { divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT - italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ | italic_B start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT | } ,

which is minimized at B^0,jk=Sλ(i=1nzijzik/n)subscript^𝐵0𝑗𝑘subscript𝑆𝜆superscriptsubscript𝑖1𝑛subscript𝑧𝑖𝑗subscript𝑧𝑖𝑘𝑛\hat{B}_{0,jk}=S_{\lambda}(\sum_{i=1}^{n}z_{ij}z_{ik}/n)over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT 0 , italic_j italic_k end_POSTSUBSCRIPT = italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT / italic_n ) for j,k[p]𝑗𝑘delimited-[]𝑝j,k\in[p]italic_j , italic_k ∈ [ italic_p ] and Sλ(a)=sign(a)×max(|a|λ,0)subscript𝑆𝜆𝑎sign𝑎𝑎𝜆0S_{\lambda}(a)=\text{sign}(a)\times\max(|a|-\lambda,0)italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( italic_a ) = sign ( italic_a ) × roman_max ( | italic_a | - italic_λ , 0 ) is the soft-thresholding operator at λ𝜆\lambdaitalic_λ (Bickel et al., 2008a; Rothman et al., 2009). The optimization problem in (6) is nontrivial, as the constraint set is nonconvex. To tackle this challenge, we propose an efficient two-step estimation procedure that first solves the non-constrained optimization and then finds a projection of the estimator to the constrained set. This sequential estimation procedure greatly simplifies the computation and gives asymptotically consistent estimators. Similar sequential procedures have been commonly employed in statistical learning and optimization problems; see, e.g., Li et al. (2010); Zhang et al. (2020).

Algorithm 1 Sparse covariance regression with multivariate sparse group lasso
  Input: Tuning parameters λ𝜆\lambdaitalic_λ, λgsubscript𝜆𝑔\lambda_{g}italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, and 𝐃~~𝐃\widetilde{{\mathbf{D}}}over~ start_ARG bold_D end_ARG as the initial estimator of 𝐃𝐃{\mathbf{D}}bold_D in (4).
  repeat
            Step 1: For l[q+1]𝑙delimited-[]𝑞1l\in[q+1]italic_l ∈ [ italic_q + 1 ], compute 𝐑l=[𝐑l(diag),𝐑l(off)]subscript𝐑𝑙superscriptsubscript𝐑𝑙diagsuperscriptsubscript𝐑𝑙off{\mathbf{R}}_{l}=[{\mathbf{R}}_{l}^{(\text{diag})},{\mathbf{R}}_{l}^{(\text{% off})}]bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = [ bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT , bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ] as:
𝐑~l(diag)=𝐕(diag)𝐗[l]𝐃~[l](diag),𝐑~l(off)=𝐕(off)𝐗[l]𝐃~[l](off)formulae-sequencesuperscriptsubscript~𝐑𝑙diagsuperscript𝐕diagsubscript𝐗absentdelimited-[]𝑙superscriptsubscript~𝐃delimited-[]𝑙absentdiagsuperscriptsubscript~𝐑𝑙offsuperscript𝐕offsubscript𝐗absentdelimited-[]𝑙superscriptsubscript~𝐃delimited-[]𝑙absentoff\widetilde{{\mathbf{R}}}_{l}^{(\text{diag})}={\mathbf{V}}^{(\text{diag})}-{% \mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}[-l]}\widetilde{{\mathbf{D}}}_% {[-l]\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{diag})},\quad\widetilde{{% \mathbf{R}}}_{l}^{(\text{off})}={\mathbf{V}}^{(\text{off})}-{\mathbf{X}}_{% \mathrel{\scalebox{0.4}{$\bullet$}}[-l]}\widetilde{{\mathbf{D}}}_{[-l]\mathrel% {\scalebox{0.4}{$\bullet$}}}^{(\text{off})}over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT = bold_V start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT - bold_X start_POSTSUBSCRIPT ∙ [ - italic_l ] end_POSTSUBSCRIPT over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT [ - italic_l ] ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT , over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT = bold_V start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT - bold_X start_POSTSUBSCRIPT ∙ [ - italic_l ] end_POSTSUBSCRIPT over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT [ - italic_l ] ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT
            Step 2: For l=1𝑙1l=1italic_l = 1, update 𝐃~l=(𝐃~l(diag),𝐃~l(off))subscript~𝐃𝑙absentsuperscriptsubscript~𝐃𝑙absentdiagsuperscriptsubscript~𝐃𝑙absentoff\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}=(\widetilde{{% \mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{diag})},\widetilde% {{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{off})})over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT = ( over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT , over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ) by
𝐃~l(diag)superscriptsubscript~𝐃𝑙absentdiag\displaystyle\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{% (\text{diag})}over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT =(1n𝐗l𝐗l)1(1n𝐗l𝐑~l(diag)),absentsuperscript1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐗absent𝑙11𝑛superscriptsubscript𝐗absent𝑙topsuperscriptsubscript~𝐑𝑙diag\displaystyle=\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}\bigg{)}% ^{-1}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}^{% \top}\widetilde{{\mathbf{R}}}_{l}^{(\text{diag})}\bigg{)},= ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT ) ,
𝐃~l(off)superscriptsubscript~𝐃𝑙absentoff\displaystyle\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{% (\text{off})}over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT =(1n𝐗l𝐗l)1Sλ(1n𝐗l𝐑~l(off)).absentsuperscript1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐗absent𝑙1subscript𝑆𝜆1𝑛superscriptsubscript𝐗absent𝑙topsuperscriptsubscript~𝐑𝑙off\displaystyle=\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}\bigg{)}% ^{-1}S_{\lambda}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}\widetilde{{\mathbf{R}}}_{l}^{(\text{off})}\bigg{)}.= ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ) .
            Step 3: For l1𝑙1l\neq 1italic_l ≠ 1, check the condition below
Sλ(1n𝐗l𝐑~l)2<λg.subscriptnormsubscript𝑆𝜆1𝑛superscriptsubscript𝐗absent𝑙topsubscript~𝐑𝑙2subscript𝜆𝑔\displaystyle\bigg{\|}S_{\lambda}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{% \scalebox{0.4}{$\bullet$}}l}^{\top}\widetilde{{\mathbf{R}}}_{l}\bigg{)}\bigg{% \|}_{2}<\lambda_{g}.∥ italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT .
                             If the condition above is satisfied, set 𝐃~l=𝟎subscript~𝐃𝑙absent0\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}={\mathbf{0}}over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT = bold_0. If not, update 𝐃~lsubscript~𝐃𝑙absent\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT by
𝐃~l=(1n𝐗l𝐗l+λg𝐃~l2)1Sλ(1n𝐗l𝐑~l).subscript~𝐃𝑙absentsuperscript1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐗absent𝑙subscript𝜆𝑔subscriptnormsubscript~𝐃𝑙absent21subscript𝑆𝜆1𝑛superscriptsubscript𝐗absent𝑙topsubscript~𝐑𝑙\displaystyle\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}=% \bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}^{\top}{% \mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}+\frac{\lambda_{g}}{\|% \widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}\|_{2}}\bigg{)}% ^{-1}S_{\lambda}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}\widetilde{{\mathbf{R}}}_{l}\bigg{)}.over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT = ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_ARG start_ARG ∥ over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG bold_R end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) .
  until the algorithm converges.
  Step 4: Compute 𝐁^0,𝐁^1,,𝐁^qsubscript^𝐁0subscript^𝐁1subscript^𝐁𝑞\widehat{\mathbf{B}}_{0},\widehat{{\mathbf{B}}}_{1},\ldots,\widehat{{\mathbf{B% }}}_{q}over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT as in (7).

To facilitate estimation, we reorder the columns of 𝐃𝐃{\mathbf{D}}bold_D such that 𝐃=[𝐃(diag),𝐃(off)]𝐃superscript𝐃(diag)superscript𝐃off{\mathbf{D}}=[{\mathbf{D}}^{\text{(diag)}},{\mathbf{D}}^{(\text{off})}]bold_D = [ bold_D start_POSTSUPERSCRIPT (diag) end_POSTSUPERSCRIPT , bold_D start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ], where 𝐃(diag)superscript𝐃diag{\mathbf{D}}^{(\text{diag})}bold_D start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT is a (q+1)×p𝑞1𝑝(q+1)\times p( italic_q + 1 ) × italic_p matrix collecting diagonals of 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, and 𝐃(off)superscript𝐃off{\mathbf{D}}^{(\text{off})}bold_D start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT is a (q+1)×p(p1)/2𝑞1𝑝𝑝12(q+1)\times p(p-1)/2( italic_q + 1 ) × italic_p ( italic_p - 1 ) / 2 matrix collecting off-diagonals of 𝐁0,𝐁1,,𝐁qsubscript𝐁0subscript𝐁1subscript𝐁𝑞{\mathbf{B}}_{0},{\mathbf{B}}_{1},\ldots,{\mathbf{B}}_{q}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT. Similarly, we reorder the columns of 𝐕𝐕{\mathbf{V}}bold_V in (4) such that 𝐕=[𝐕(diag),𝐕(off)]𝐕superscript𝐕diagsuperscript𝐕off{\mathbf{V}}=[{\mathbf{V}}^{(\text{diag})},{\mathbf{V}}^{(\text{off})}]bold_V = [ bold_V start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT , bold_V start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ]. For l[q+1]𝑙delimited-[]𝑞1l\in[q+1]italic_l ∈ [ italic_q + 1 ], let 𝐃lsubscript𝐃𝑙absent{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT, 𝐃l(diag)superscriptsubscript𝐃𝑙absentdiag{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{diag})}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT and 𝐃l(off)superscriptsubscript𝐃𝑙absentoff{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{off})}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT be the l𝑙litalic_lth row of 𝐃𝐃{\mathbf{D}}bold_D, 𝐃(diag)superscript𝐃diag{\mathbf{D}}^{(\text{diag})}bold_D start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT and 𝐃(off)superscript𝐃off{\mathbf{D}}^{(\text{off})}bold_D start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT, respectively, and let [l]=[q+1]\{l}delimited-[]𝑙\delimited-[]𝑞1𝑙[-l]=[q+1]\backslash\{l\}[ - italic_l ] = [ italic_q + 1 ] \ { italic_l }. Write 𝐑l=𝐕𝐗[l]𝐃[l]subscript𝐑𝑙𝐕subscript𝐗absentdelimited-[]𝑙subscript𝐃delimited-[]𝑙absent{\mathbf{R}}_{l}={\mathbf{V}}-{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}% }[-l]}{\mathbf{D}}_{[-l]\mathrel{\scalebox{0.4}{$\bullet$}}}bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = bold_V - bold_X start_POSTSUBSCRIPT ∙ [ - italic_l ] end_POSTSUBSCRIPT bold_D start_POSTSUBSCRIPT [ - italic_l ] ∙ end_POSTSUBSCRIPT as the n×p(p+1)/2𝑛𝑝𝑝12n\times p(p+1)/2italic_n × italic_p ( italic_p + 1 ) / 2 partial residual matrix. We first summarize our optimization procedure in Algorithm 1, and then describe each step in detail.

For optimization, we adopt the blockwise coordinate descent algorithm as described in Algorithm 1. For l=1𝑙1l=1italic_l = 1, the solution to 𝐃l(diag)superscriptsubscript𝐃𝑙absentdiag{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{diag})}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( diag ) end_POSTSUPERSCRIPT is obtained by the least squares estimator, as the diagonal elements of 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are not penalized, and the solution to 𝐃l(off)superscriptsubscript𝐃𝑙absentoff{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}^{(\text{off})}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT is obtained by the lasso estimator, as the off-diagonal elements of 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are not penalized by the group lasso penalty. For l=2,,q+1𝑙2𝑞1l=2,\ldots,q+1italic_l = 2 , … , italic_q + 1, the solution to 𝐃lsubscript𝐃𝑙absent{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT is obtained by the sparse group lasso estimator. In particular, it can be shown that the Karush-Kuhn-Tucker condition for the sparse group lasso (Simon et al., 2013) is satisfied with 𝐃l=𝟎subscript𝐃𝑙absent0{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}={\mathbf{0}}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT = bold_0 if

Sλ(1n𝐗l𝐑l)2<λg,subscriptnormsubscript𝑆𝜆1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐑𝑙2subscript𝜆𝑔\displaystyle\bigg{\|}S_{\lambda}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{% \scalebox{0.4}{$\bullet$}}l}^{\top}{\mathbf{R}}_{l}\bigg{)}\bigg{\|}_{2}<% \lambda_{g},∥ italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ,

where 𝐗lsubscript𝐗absent𝑙{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT is the l𝑙litalic_lth column of 𝐗𝐗{\mathbf{X}}bold_X in (4) and Sλ(𝐚)subscript𝑆𝜆𝐚S_{\lambda}({\mathbf{a}})italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( bold_a ) is the element-wise soft-thresholding operator at λ𝜆\lambdaitalic_λ, that is, {Sλ(𝐚)}j=sign(aj)×max(|aj|λ,0)subscriptsubscript𝑆𝜆𝐚𝑗signsubscript𝑎𝑗subscript𝑎𝑗𝜆0\{S_{\lambda}({\mathbf{a}})\}_{j}=\text{sign}(a_{j})\times\max(|a_{j}|-\lambda% ,0){ italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( bold_a ) } start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = sign ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) × roman_max ( | italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | - italic_λ , 0 ). When 𝐃l𝟎subscript𝐃𝑙absent0{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}\neq{\mathbf{0}}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ≠ bold_0, the solution for 𝐃lsubscript𝐃𝑙absent{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT is determined by

𝐃~lsubscript~𝐃𝑙absent\displaystyle\widetilde{{\mathbf{D}}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT =(1n𝐗l𝐗l+λg𝐃l2)1Sλ(1n𝐗l𝐑l).absentsuperscript1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐗absent𝑙subscript𝜆𝑔subscriptnormsubscript𝐃𝑙absent21subscript𝑆𝜆1𝑛superscriptsubscript𝐗absent𝑙topsubscript𝐑𝑙\displaystyle=\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$\bullet$}}l}+\frac{% \lambda_{g}}{\|{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}\|_{2}}\bigg% {)}^{-1}S_{\lambda}\bigg{(}\frac{1}{n}{\mathbf{X}}_{\mathrel{\scalebox{0.4}{$% \bullet$}}l}^{\top}{\mathbf{R}}_{l}\bigg{)}.= ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_X start_POSTSUBSCRIPT ∙ italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_R start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) .

Steps 1-3 in Algorithm 1 solve the unconstrained problem in (6). In Step 4 and given the estimators 𝐁~0,𝐁~1,,𝐁~qsubscript~𝐁0subscript~𝐁1subscript~𝐁𝑞\widetilde{{\mathbf{B}}}_{0},\widetilde{{\mathbf{B}}}_{1},\ldots,\widetilde{{% \mathbf{B}}}_{q}over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT from Steps 1-3, we set

𝐁^0=(1+δ)1𝐁~0+δ/(1+δ)𝐈p,𝐁^h=(1+δ)1𝐁~h,h[q],formulae-sequencesubscript^𝐁0superscript1𝛿1subscript~𝐁0𝛿1𝛿subscript𝐈𝑝formulae-sequencesubscript^𝐁superscript1𝛿1subscript~𝐁delimited-[]𝑞\widehat{\mathbf{B}}_{0}=(1+\delta)^{-1}\widetilde{{\mathbf{B}}}_{0}+\delta/(1% +\delta){\mathbf{I}}_{p},\quad\widehat{\mathbf{B}}_{h}=(1+\delta)^{-1}% \widetilde{{\mathbf{B}}}_{h},\,\,h\in[q],over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( 1 + italic_δ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_δ / ( 1 + italic_δ ) bold_I start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = ( 1 + italic_δ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT , italic_h ∈ [ italic_q ] ,

where δ=max[0,λmin(𝐁~0+h=1q𝐁~h)]𝛿max0subscript𝜆minsubscript~𝐁0superscriptsubscript1𝑞superscriptsubscript~𝐁\delta=\text{max}[0,-\lambda_{\text{min}}(\widetilde{{\mathbf{B}}}_{0}+\sum_{h% =1}^{q}\widetilde{{\mathbf{B}}}_{h}^{-})]italic_δ = max [ 0 , - italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ]. These estimators 𝐁^0,𝐁^1,,𝐁^qsubscript^𝐁0subscript^𝐁1subscript^𝐁𝑞\widehat{\mathbf{B}}_{0},\widehat{\mathbf{B}}_{1},\ldots,\widehat{\mathbf{B}}_% {q}over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ensure 𝚺(𝐱)𝚺𝐱\mbox{\boldmath$\Sigma$}({\mathbf{x}})bold_Σ ( bold_x ) is positive semi-definite, and give a Ledoit-Wolf type shrinkage estimator (Ledoit and Wolf, 2004),

^𝚺(𝐱)=11+δ(𝐁~0+h=1qxh𝐁~h)+δ1+δ𝐈p.^absent𝚺𝐱11𝛿subscript~𝐁0superscriptsubscript1𝑞subscript𝑥subscript~𝐁𝛿1𝛿subscript𝐈𝑝\displaystyle\widehat{}\mbox{\boldmath$\Sigma$}({\mathbf{x}})=\frac{1}{1+% \delta}\bigg{(}\widetilde{{\mathbf{B}}}_{0}+\sum_{h=1}^{q}x_{h}\widetilde{{% \mathbf{B}}}_{h}\bigg{)}+\frac{\delta}{1+\delta}{\mathbf{I}}_{p}.over^ start_ARG end_ARG bold_Σ ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 1 + italic_δ end_ARG ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) + divide start_ARG italic_δ end_ARG start_ARG 1 + italic_δ end_ARG bold_I start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT . (7)

As n𝑛nitalic_n increases, it follows from Theorem 2 that 𝐁~0,𝐁~1,,𝐁~qsubscript~𝐁0subscript~𝐁1subscript~𝐁𝑞\widetilde{\mathbf{B}}_{0},\widetilde{\mathbf{B}}_{1},\ldots,\widetilde{% \mathbf{B}}_{q}over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT estimated from Steps 1-3 are consistent and satisfy the constraint (6) with high probability. That is, as n𝑛nitalic_n increases, δ𝛿\deltaitalic_δ in (7) converges to zero. See more discussions after Theorem 2.

Two parameters λ𝜆\lambdaitalic_λ and λgsubscript𝜆𝑔\lambda_{g}italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT in (6) require tuning. In our procedure, they are jointly selected via L𝐿Litalic_L-fold cross validation. We let L=5𝐿5L=5italic_L = 5 in our simulation studies and real data analysis.

4 Theoretical Properties

In this section, we investigate the convergence rate of the estimator from the multivariate sparse group lasso regression in (6). Theoretical properties of the sparse group lasso estimator in multivariate linear regressions have been investigated in Li et al. (2015). However, they assumed a Gaussian distribution, which can be restrictive when modeling yijyiksubscript𝑦𝑖𝑗subscript𝑦𝑖𝑘y_{ij}y_{ik}italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT, the product of expressions from genes j𝑗jitalic_j and k𝑘kitalic_k in sample i𝑖iitalic_i. Moreover, the penalty term (5) is more involved than the sparse group lasso penalty in Li et al. (2015), as 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is excluded from the group sparsity penalty and the diagonal elements of 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are excluded from the element-wise sparsity.

We first discuss notation. After rearranging the columns of 𝐃𝐃{\mathbf{D}}bold_D such that 𝐃=[𝐃(diag),𝐃(off)]𝐃superscript𝐃(diag)superscript𝐃off{\mathbf{D}}=[{\mathbf{D}}^{\text{(diag)}},{\mathbf{D}}^{(\text{off})}]bold_D = [ bold_D start_POSTSUPERSCRIPT (diag) end_POSTSUPERSCRIPT , bold_D start_POSTSUPERSCRIPT ( off ) end_POSTSUPERSCRIPT ] as in Section 3, define an index set 𝒥0subscript𝒥0\mathcal{J}_{0}caligraphic_J start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that

𝒥0={(l,m):l[q+1],m[p(p+1)/2]}{(1,1),,(1,p)},subscript𝒥0conditional-set𝑙𝑚formulae-sequence𝑙delimited-[]𝑞1𝑚delimited-[]𝑝𝑝12111𝑝\mathcal{J}_{0}=\{(l,m):l\in[q+1],m\in[p(p+1)/2]\}\setminus\{(1,1),\ldots,(1,p% )\},caligraphic_J start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = { ( italic_l , italic_m ) : italic_l ∈ [ italic_q + 1 ] , italic_m ∈ [ italic_p ( italic_p + 1 ) / 2 ] } ∖ { ( 1 , 1 ) , … , ( 1 , italic_p ) } , (8)

containing indices in 𝐃𝐃{\mathbf{D}}bold_D that are included in the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT penalty in (5). For a matrix 𝐃(q+1)×p(p+1)/2𝐃superscript𝑞1𝑝𝑝12{\mathbf{D}}\in\mathbb{R}^{(q+1)\times p(p+1)/2}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_q + 1 ) × italic_p ( italic_p + 1 ) / 2 end_POSTSUPERSCRIPT, let 𝒥1(𝐃)={(l,m):Dlm0}subscript𝒥1𝐃conditional-set𝑙𝑚subscript𝐷𝑙𝑚0\mathcal{J}_{1}({\mathbf{D}})=\{(l,m):D_{lm}\neq 0\}caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D ) = { ( italic_l , italic_m ) : italic_D start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT ≠ 0 } be the index set of non-zero elements in 𝐃𝐃{\mathbf{D}}bold_D, 𝒥2(𝐃)={l:𝐃l𝟎,l1}subscript𝒥2𝐃conditional-set𝑙formulae-sequencesubscript𝐃𝑙absent0𝑙1\mathcal{J}_{2}({\mathbf{D}})=\{l:{\mathbf{D}}_{l\mathrel{\scalebox{0.4}{$% \bullet$}}}\neq{\mathbf{0}},l\neq 1\}caligraphic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D ) = { italic_l : bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ≠ bold_0 , italic_l ≠ 1 } be the index set of non-zero rows excluding the first row and 𝒥2c(𝐃)=[q+1]{1}𝒥2(𝐃)subscriptsuperscript𝒥𝑐2𝐃delimited-[]𝑞11subscript𝒥2𝐃\mathcal{J}^{c}_{2}({\mathbf{D}})=[q+1]\setminus\{1\}\setminus\mathcal{J}_{2}(% {\mathbf{D}})caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D ) = [ italic_q + 1 ] ∖ { 1 } ∖ caligraphic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D ). For an index set 𝒥{(l,m):l[q+1],m[p(p+1)/2]}𝒥conditional-set𝑙𝑚formulae-sequence𝑙delimited-[]𝑞1𝑚delimited-[]𝑝𝑝12\mathcal{J}\subseteq\{(l,m):l\in[q+1],m\in[p(p+1)/2]\}caligraphic_J ⊆ { ( italic_l , italic_m ) : italic_l ∈ [ italic_q + 1 ] , italic_m ∈ [ italic_p ( italic_p + 1 ) / 2 ] }, define 𝐃𝒥subscript𝐃𝒥{\mathbf{D}}_{\mathcal{J}}bold_D start_POSTSUBSCRIPT caligraphic_J end_POSTSUBSCRIPT such that the (l,m)𝑙𝑚(l,m)( italic_l , italic_m )th element of 𝐃𝒥subscript𝐃𝒥{\mathbf{D}}_{\mathcal{J}}bold_D start_POSTSUBSCRIPT caligraphic_J end_POSTSUBSCRIPT equals to the (l,m)𝑙𝑚(l,m)( italic_l , italic_m )th element of 𝐃𝐃{\mathbf{D}}bold_D if (l,m)𝒥𝑙𝑚𝒥(l,m)\in\mathcal{J}( italic_l , italic_m ) ∈ caligraphic_J and is 0 otherwise. For a given index set 𝒥{2,,q+1}superscript𝒥2𝑞1\mathcal{J}^{\prime}\subseteq\{2,\ldots,q+1\}caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ { 2 , … , italic_q + 1 }, define 𝐃𝒥subscript𝐃superscript𝒥{\mathbf{D}}_{\mathcal{J}^{\prime}}bold_D start_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that the l𝑙litalic_lth row of 𝐃𝒥subscript𝐃superscript𝒥{\mathbf{D}}_{\mathcal{J}^{\prime}}bold_D start_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT equals to the l𝑙litalic_lth row of 𝐃𝐃{\mathbf{D}}bold_D if l𝒥𝑙superscript𝒥l\in\mathcal{J}^{\prime}italic_l ∈ caligraphic_J start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and is 𝟎0{\mathbf{0}}bold_0 otherwise.

The unconstrained objective function in (6) can be rewritten as

12n𝐕𝐗𝐃F2+λ𝐃𝒥01+λgl=2q+1𝐃l2.12𝑛superscriptsubscriptnorm𝐕𝐗𝐃𝐹2𝜆subscriptnormsubscript𝐃subscript𝒥01subscript𝜆𝑔superscriptsubscript𝑙2𝑞1subscriptnormsubscript𝐃𝑙absent2\displaystyle\frac{1}{2n}\|{\mathbf{V}}-{\mathbf{X}}{\mathbf{D}}\|_{F}^{2}+% \lambda\|{\mathbf{D}}_{\mathcal{J}_{0}}\|_{1}+\lambda_{g}\sum_{l=2}^{q+1}\|{% \mathbf{D}}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}\|_{2}.divide start_ARG 1 end_ARG start_ARG 2 italic_n end_ARG ∥ bold_V - bold_XD ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ bold_D start_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_l = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q + 1 end_POSTSUPERSCRIPT ∥ bold_D start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (9)

In our theoretical analysis, we will focus on the minimizer of (9), which is not subject to the PD constraint in (6). Theorem 2 below shows that, there exists a sufficiently large nn0𝑛subscript𝑛0n\geq n_{0}italic_n ≥ italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that the constrained estimator from (6) reduces to the unconstrained estimator from (9) (see remark after Theorem 2).

4.1 Convergence rate

Let 𝐁0,𝐁1,,𝐁qsuperscriptsubscript𝐁0superscriptsubscript𝐁1superscriptsubscript𝐁𝑞{\mathbf{B}}_{0}^{\ast},{\mathbf{B}}_{1}^{\ast},\ldots,{\mathbf{B}}_{q}^{\ast}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the true coefficient matrices in (1), and let 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the coefficient matrix in (4) calculated from 𝐁0,𝐁1,,𝐁qsuperscriptsubscript𝐁0superscriptsubscript𝐁1superscriptsubscript𝐁𝑞{\mathbf{B}}_{0}^{\ast},{\mathbf{B}}_{1}^{\ast},\ldots,{\mathbf{B}}_{q}^{\ast}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_B start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Let s=|𝒥1(𝐃)|𝑠subscript𝒥1superscript𝐃s=|\mathcal{J}_{1}({\mathbf{D}}^{\ast})|italic_s = | caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | and r=|𝒥2(𝐃)|𝑟subscript𝒥2superscript𝐃r=|\mathcal{J}_{2}({\mathbf{D}}^{\ast})|italic_r = | caligraphic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) | denote the number of nonzero entries and groups in 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, respectively. Let |||\cdot|| ⋅ | denote the cardinality of a set. We first state regularity conditions.

Assumption 1

Assume λ𝑚𝑖𝑛(𝐁0+h=1q𝐁h)>0subscript𝜆𝑚𝑖𝑛superscriptsubscript𝐁0normal-∗superscriptsubscript1𝑞superscriptsuperscriptsubscript𝐁normal-∗0\lambda_{\text{min}}({\mathbf{B}}_{0}^{\ast}+\sum_{h=1}^{q}{{\mathbf{B}}_{h}^{% \ast}}^{-})>0italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0 and xih[0,1]subscript𝑥𝑖01x_{ih}\in[0,1]italic_x start_POSTSUBSCRIPT italic_i italic_h end_POSTSUBSCRIPT ∈ [ 0 , 1 ], for i[n],h[q]formulae-sequence𝑖delimited-[]𝑛delimited-[]𝑞i\in[n],h\in[q]italic_i ∈ [ italic_n ] , italic_h ∈ [ italic_q ].

Assumption 2

For any non-zero matrix 𝚫(q+1)×p(p+1)/2𝚫superscript𝑞1𝑝𝑝12\mbox{\boldmath$\Delta$}\in\mathbb{R}^{(q+1)\times p(p+1)/2}bold_Δ ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_q + 1 ) × italic_p ( italic_p + 1 ) / 2 end_POSTSUPERSCRIPT that satisfies

𝚫𝒥1c(𝐃)1+2s/rl𝒥2c(𝐃)𝚫l23𝚫𝒥1(𝐃)1+2s/rl𝒥2(𝐃)𝚫l2,subscriptnormsubscript𝚫subscriptsuperscript𝒥𝑐1superscript𝐃12𝑠𝑟subscript𝑙subscriptsuperscript𝒥𝑐2superscript𝐃subscriptnormsubscript𝚫𝑙absent23subscriptnormsubscript𝚫subscript𝒥1superscript𝐃12𝑠𝑟subscript𝑙subscript𝒥2superscript𝐃subscriptnormsubscript𝚫𝑙absent2\displaystyle\|\mbox{\boldmath$\Delta$}_{\mathcal{J}^{c}_{1}({\mathbf{D}}^{% \ast})}\|_{1}+2\sqrt{s/r}\sum_{l\in\mathcal{J}^{c}_{2}({\mathbf{D}}^{\ast})}\|% \mbox{\boldmath$\Delta$}_{l\mathrel{\scalebox{0.4}{$\bullet$}}}\|_{2}\leq 3\|% \mbox{\boldmath$\Delta$}_{\mathcal{J}_{1}({\mathbf{D}}^{\ast})}\|_{1}+2\sqrt{s% /r}\sum_{l\in\mathcal{J}_{2}({\mathbf{D}}^{\ast})}\|\mbox{\boldmath$\Delta$}_{% l\mathrel{\scalebox{0.4}{$\bullet$}}}\|_{2},∥ bold_Δ start_POSTSUBSCRIPT caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 square-root start_ARG italic_s / italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ bold_Δ start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 3 ∥ bold_Δ start_POSTSUBSCRIPT caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 square-root start_ARG italic_s / italic_r end_ARG ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_J start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT ∥ bold_Δ start_POSTSUBSCRIPT italic_l ∙ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

there exist a constant κ>0𝜅0\kappa>0italic_κ > 0 such that

𝐗𝚫F2n𝚫F2>κ.superscriptsubscriptnorm𝐗𝚫𝐹2𝑛superscriptsubscriptnorm𝚫𝐹2𝜅\displaystyle\frac{\|{\mathbf{X}}\mbox{\boldmath$\Delta$}\|_{F}^{2}}{n\|\mbox{% \boldmath$\Delta$}\|_{F}^{2}}>\kappa.divide start_ARG ∥ bold_X bold_Δ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n ∥ bold_Δ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG > italic_κ .

Assumption 2 is a restricted eigenvalue condition that ensures the convexity of 𝐗𝐗/nsuperscript𝐗top𝐗𝑛{\mathbf{X}}^{\top}{\mathbf{X}}/nbold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X / italic_n is bounded in the space of 𝚫𝚫\Deltabold_Δ as specified in Assumption 2, and is commonly employed in the literature (Hastie et al., 2015).

We first derive a deterministic upper bound for the prediction error of 𝐃^^𝐃\widehat{\mathbf{D}}over^ start_ARG bold_D end_ARG obtained from minimizing (9).

Theorem 1

Suppose Assumptions 1-2 hold, λ(2/n)𝐗𝐄𝜆2𝑛subscriptnormsuperscript𝐗top𝐄\lambda\geq(2/n)\|{\mathbf{X}}^{\top}{\mathbf{E}}\|_{\infty}italic_λ ≥ ( 2 / italic_n ) ∥ bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_E ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT and λg=Cλs/rsubscript𝜆𝑔𝐶𝜆𝑠𝑟\lambda_{g}=C\lambda\sqrt{s/r}italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_C italic_λ square-root start_ARG italic_s / italic_r end_ARG for some constant C>0𝐶0C>0italic_C > 0. Then, the solution 𝐃^normal-^𝐃\widehat{{\mathbf{D}}}over^ start_ARG bold_D end_ARG that minimizes (9) satisfies

1n𝐗(𝐃^𝐃)F21𝑛superscriptsubscriptnorm𝐗^𝐃superscript𝐃𝐹2\displaystyle\frac{1}{n}\|{\mathbf{X}}(\widehat{{\mathbf{D}}}-{\mathbf{D}}^{% \ast})\|_{F}^{2}divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∥ bold_X ( over^ start_ARG bold_D end_ARG - bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (3+2C)2λ2sκ,absentsuperscript32𝐶2superscript𝜆2𝑠𝜅\displaystyle\leq\frac{(3+2C)^{2}\lambda^{2}s}{\kappa},≤ divide start_ARG ( 3 + 2 italic_C ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_s end_ARG start_ARG italic_κ end_ARG ,

where s𝑠sitalic_s and κ𝜅\kappaitalic_κ are as defined in Assumption 2.

Theorem 1 shows that the mean square prediction error is bounded by a factor of λ2superscript𝜆2\lambda^{2}italic_λ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The bound also depends on the restricted eigenvalue constant κ𝜅\kappaitalic_κ and the sparsity of the true model.

Theorem 2

Suppose Assumptions 1-2 hold. Assume ϵijksubscriptitalic-ϵ𝑖𝑗𝑘\epsilon_{ijk}italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT’s are zero-mean sub-exponential random variables, and ϵijksubscriptitalic-ϵ𝑖𝑗𝑘\epsilon_{ijk}italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT and ϵijksubscriptitalic-ϵsuperscript𝑖normal-′superscript𝑗normal-′superscript𝑘normal-′\epsilon_{i^{\prime}j^{\prime}k^{\prime}}italic_ϵ start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are independent for ii𝑖superscript𝑖normal-′i\neq i^{\prime}italic_i ≠ italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Let ϵijkψ1=supd1d1(𝔼|ϵijk|d)1/dsubscriptnormsubscriptitalic-ϵ𝑖𝑗𝑘subscript𝜓1subscriptsupremum𝑑1superscript𝑑1superscript𝔼superscriptsubscriptitalic-ϵ𝑖𝑗𝑘𝑑1𝑑\|\epsilon_{ijk}\|_{\psi_{1}}=\sup_{d\geq 1}d^{-1}(\mathbb{E}|\epsilon_{ijk}|^% {d})^{1/d}∥ italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_sup start_POSTSUBSCRIPT italic_d ≥ 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( blackboard_E | italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_d end_POSTSUPERSCRIPT be the sub-exponential norm of ϵijksubscriptitalic-ϵ𝑖𝑗𝑘\epsilon_{ijk}italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT and K=maxijkϵijkψ1𝐾subscript𝑖𝑗𝑘subscriptnormsubscriptitalic-ϵ𝑖𝑗𝑘subscript𝜓1K=\max_{ijk}\|\epsilon_{ijk}\|_{\psi_{1}}italic_K = roman_max start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ∥ italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. For constants c>0,C>0formulae-sequence𝑐0𝐶0c>0,C>0italic_c > 0 , italic_C > 0 and A>c0.5𝐴superscript𝑐0.5A>c^{-0.5}italic_A > italic_c start_POSTSUPERSCRIPT - 0.5 end_POSTSUPERSCRIPT, let

λ=2KAlog{p(p+1)(q+1)}n𝑎𝑛𝑑λg=Cλsr.formulae-sequence𝜆2𝐾𝐴𝑝𝑝1𝑞1𝑛𝑎𝑛𝑑subscript𝜆𝑔𝐶𝜆𝑠𝑟\displaystyle\lambda=2KA\sqrt{\frac{\log\{p(p+1)(q+1)\}}{n}}\quad\text{and}% \quad\lambda_{g}=C\lambda\sqrt{\frac{s}{r}}.italic_λ = 2 italic_K italic_A square-root start_ARG divide start_ARG roman_log { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } end_ARG start_ARG italic_n end_ARG end_ARG and italic_λ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_C italic_λ square-root start_ARG divide start_ARG italic_s end_ARG start_ARG italic_r end_ARG end_ARG .

Then, with probability at least 1{p(p+1)(q+1)}(1cA2)1superscript𝑝𝑝1𝑞11𝑐superscript𝐴21-\{p(p+1)(q+1)\}^{(1-cA^{2})}1 - { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } start_POSTSUPERSCRIPT ( 1 - italic_c italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT, it holds that

𝐃^𝐃F2(3+2C)KAκslog{p(p+1)(q+1)}n.subscriptnorm^𝐃superscript𝐃𝐹232𝐶𝐾𝐴𝜅𝑠𝑝𝑝1𝑞1𝑛\displaystyle\|\widehat{{\mathbf{D}}}-{\mathbf{D}}^{\ast}\|_{F}\leq\frac{2(3+2% C)KA}{\kappa}\sqrt{\frac{s\log\{p(p+1)(q+1)\}}{n}}.∥ over^ start_ARG bold_D end_ARG - bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≤ divide start_ARG 2 ( 3 + 2 italic_C ) italic_K italic_A end_ARG start_ARG italic_κ end_ARG square-root start_ARG divide start_ARG italic_s roman_log { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } end_ARG start_ARG italic_n end_ARG end_ARG .

Theorem 2 shows that the Frobenius norm of the estimation error is bounded by a factor of order smax(logp,logq)/n𝑠𝑝𝑞𝑛\sqrt{s\max(\log p,\log q)/n}square-root start_ARG italic_s roman_max ( roman_log italic_p , roman_log italic_q ) / italic_n end_ARG. Here, we do not assume ϵijksubscriptitalic-ϵ𝑖𝑗𝑘\epsilon_{ijk}italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT and ϵijksubscriptitalic-ϵ𝑖superscript𝑗superscript𝑘\epsilon_{ij^{\prime}k^{\prime}}italic_ϵ start_POSTSUBSCRIPT italic_i italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT are independent, as zijziksubscript𝑧𝑖𝑗subscript𝑧𝑖𝑘z_{ij}z_{ik}italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT and zijziksubscript𝑧𝑖superscript𝑗subscript𝑧𝑖superscript𝑘z_{ij^{\prime}}z_{ik^{\prime}}italic_z start_POSTSUBSCRIPT italic_i italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from gene pairs (j,k)𝑗𝑘(j,k)( italic_j , italic_k ) and (j,k)superscript𝑗superscript𝑘(j^{\prime},k^{\prime})( italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) can be correlated. Hence, under our setting, the error terms across element-wise regression tasks can be correlated. Comparable convergence rates, up to a logarithm factor, have been derived for sparse group lasso estimators in univariate regressions (Cai et al., 2022; Zhang and Li, 2023).

Theorem 2 also suggests that estimators 𝐁~0,𝐁~1,,𝐁~qsubscript~𝐁0subscript~𝐁1subscript~𝐁𝑞\widetilde{{\mathbf{B}}}_{0},\widetilde{{\mathbf{B}}}_{1},\ldots,\widetilde{{% \mathbf{B}}}_{q}over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT from minimizing (9) satisfy the PD constraint in (6) as n𝑛nitalic_n increases. Specifically, it holds by Weyl’s inequality that

λmin(𝐁0+h=1q𝐁h)λmin(𝐁~0+h=1q𝐁~h)λmin(𝐁~0+h=1q𝐁~h𝐁0h=1q𝐁h).subscript𝜆𝑚𝑖𝑛superscriptsubscript𝐁0superscriptsubscript1𝑞superscriptsuperscriptsubscript𝐁subscript𝜆𝑚𝑖𝑛subscript~𝐁0superscriptsubscript1𝑞superscriptsubscript~𝐁subscript𝜆𝑚𝑖𝑛subscript~𝐁0superscriptsubscript1𝑞superscriptsubscript~𝐁superscriptsubscript𝐁0superscriptsubscript1𝑞superscriptsuperscriptsubscript𝐁\displaystyle\lambda_{min}({\mathbf{B}}_{0}^{\ast}+\sum_{h=1}^{q}{{\mathbf{B}}% _{h}^{\ast}}^{-})\leq\lambda_{min}(\widetilde{{\mathbf{B}}}_{0}+\sum_{h=1}^{q}% \widetilde{{\mathbf{B}}}_{h}^{-})-\lambda_{min}(\widetilde{{\mathbf{B}}}_{0}+% \sum_{h=1}^{q}\widetilde{{\mathbf{B}}}_{h}^{-}-{\mathbf{B}}_{0}^{\ast}-\sum_{h% =1}^{q}{{\mathbf{B}}_{h}^{\ast}}^{-}).italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) ≤ italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) - italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) .

Since λmin(𝐁~0+h=1q𝐁~h𝐁0h=1q𝐁h)subscript𝜆𝑚𝑖𝑛subscript~𝐁0superscriptsubscript1𝑞superscriptsubscript~𝐁superscriptsubscript𝐁0superscriptsubscript1𝑞superscriptsuperscriptsubscript𝐁\lambda_{min}(\widetilde{{\mathbf{B}}}_{0}+\sum_{h=1}^{q}\widetilde{{\mathbf{B% }}}_{h}^{-}-{\mathbf{B}}_{0}^{\ast}-\sum_{h=1}^{q}{{\mathbf{B}}_{h}^{\ast}}^{-})italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT - bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) converges to zero as n𝑛nitalic_n increases and λmin(𝐁0+h=1q𝐁h)>0subscript𝜆minsuperscriptsubscript𝐁0superscriptsubscript1𝑞superscriptsuperscriptsubscript𝐁0\lambda_{\text{min}}({\mathbf{B}}_{0}^{\ast}+\sum_{h=1}^{q}{{\mathbf{B}}_{h}^{% \ast}}^{-})>0italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0 by Assumption 1, we have, for a sufficiently large sample size n0subscript𝑛0n_{0}italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,

λmin(𝐁~0+h=1q𝐁~h)>0.subscript𝜆minsubscript~𝐁0superscriptsubscript1𝑞superscriptsubscript~𝐁0\displaystyle\lambda_{\text{min}}(\widetilde{{\mathbf{B}}}_{0}+\sum_{h=1}^{q}% \widetilde{{\mathbf{B}}}_{h}^{-})>0.italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT over~ start_ARG bold_B end_ARG start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT ) > 0 .

That is, the unconstrained optimizer of (9) satisfies the PD constraint in (6) for any nn0𝑛subscript𝑛0n\geq n_{0}italic_n ≥ italic_n start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

4.2 Statistical Inference via Debiasing

In this section, we consider the inference for 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT under the proposed multivariate sparse group lasso regression. Inspired by recent advances on debiasing in high dimensional linear regressions (Javanmard and Montanari, 2014; Zhang and Zhang, 2014; Cai et al., 2022), we design a debiased lasso estimator based on (9) which performs debiasing on each column of the proposed estimator 𝐃^^𝐃\widehat{{\mathbf{D}}}over^ start_ARG bold_D end_ARG separately and make inference on the true parameter matrix 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Since 𝐃^^𝐃\widehat{{\mathbf{D}}}over^ start_ARG bold_D end_ARG is of dimension (q+1)×p(p+1)/2𝑞1𝑝𝑝12(q+1)\times p(p+1)/2( italic_q + 1 ) × italic_p ( italic_p + 1 ) / 2, carrying out the debiasing in a column-wise fashion, each of dimension q+1𝑞1q+1italic_q + 1, significantly reduces the computational cost. The cost for this computational gain is the potential loss of efficiency, compared to debiasing all columns in 𝐃^^𝐃\widehat{{\mathbf{D}}}over^ start_ARG bold_D end_ARG simultaneously. We also note that debiasing all columns in 𝐃^^𝐃\widehat{{\mathbf{D}}}over^ start_ARG bold_D end_ARG simultaneously may not be feasible under our framework as one needs to estimate Cov(zijzik,zijzik)Covsubscript𝑧𝑖𝑗subscript𝑧𝑖𝑘subscript𝑧𝑖superscript𝑗subscript𝑧𝑖superscript𝑘\text{Cov}(z_{ij}z_{ik},z_{ij^{\prime}}z_{ik^{\prime}})Cov ( italic_z start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_i italic_k start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ), the analytical form of which is difficult to derive without distributional assumptions on 𝐳isubscript𝐳𝑖{\mathbf{z}}_{i}bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Besides non-Gaussian errors, our procedure also faces the challenge of heteroskedasticity, as Var(ϵijk)Varsubscriptitalic-ϵ𝑖𝑗𝑘\text{Var}(\epsilon_{ijk})Var ( italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT ) may depend on 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Next, we detail our approach.

Denote 𝚯^=𝐗𝐗/n^𝚯superscript𝐗top𝐗𝑛\widehat{{\bm{\Theta}}}={\mathbf{X}}^{\top}{\mathbf{X}}/nover^ start_ARG bold_Θ end_ARG = bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X / italic_n. Consider the following debiased estimator 𝐃^usuperscript^𝐃𝑢\widehat{{\mathbf{D}}}^{u}over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT:

𝐃^u=𝐃^+1n𝐌𝐗(𝐕𝐗𝐃^).superscript^𝐃𝑢^𝐃1𝑛superscript𝐌𝐗top𝐕𝐗^𝐃\displaystyle\widehat{{\mathbf{D}}}^{u}=\widehat{{\mathbf{D}}}+\frac{1}{n}{% \mathbf{M}}{\mathbf{X}}^{\top}({\mathbf{V}}-{\mathbf{X}}\widehat{{\mathbf{D}}}).over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = over^ start_ARG bold_D end_ARG + divide start_ARG 1 end_ARG start_ARG italic_n end_ARG bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_V - bold_X over^ start_ARG bold_D end_ARG ) . (10)

where 𝐌=[𝐦1,,𝐦q+1](q+1)×(q+1)𝐌superscriptsubscript𝐦1subscript𝐦𝑞1topsuperscript𝑞1𝑞1{\mathbf{M}}=[{\mathbf{m}}_{1},\ldots,{\mathbf{m}}_{q+1}]^{\top}\in\mathbb{R}^% {(q+1)\times(q+1)}bold_M = [ bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_m start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_q + 1 ) × ( italic_q + 1 ) end_POSTSUPERSCRIPT, and 𝐦lsubscript𝐦𝑙{\mathbf{m}}_{l}bold_m start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, l[q+1]𝑙delimited-[]𝑞1l\in[q+1]italic_l ∈ [ italic_q + 1 ] is a solution to

𝐦l=argmin𝐦𝐦𝚯^𝐦subscript𝐦𝑙subscript𝐦superscript𝐦top^𝚯𝐦\displaystyle{\mathbf{m}}_{l}=\arg\min_{{\mathbf{m}}}{\mathbf{m}}^{\top}% \widehat{{\bm{\Theta}}}{\mathbf{m}}bold_m start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT bold_m start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG bold_Θ end_ARG bold_m
subject to𝚯^𝐦𝐞lμ,subject tosubscriptnorm^𝚯𝐦subscript𝐞𝑙𝜇\displaystyle\text{subject to}\quad\|\widehat{{\bm{\Theta}}}{\mathbf{m}}-{% \mathbf{e}}_{l}\|_{\infty}\leq\mu,subject to ∥ over^ start_ARG bold_Θ end_ARG bold_m - bold_e start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_μ ,
𝐗𝐦nβ,for any fixed1/4<β<1/2,formulae-sequencesubscriptnorm𝐗𝐦superscript𝑛𝛽for any fixed14𝛽12\displaystyle\quad\quad\quad\quad\quad\;\|{\mathbf{X}}{\mathbf{m}}\|_{\infty}% \leq n^{\beta},\quad\text{for any fixed}\quad 1/4<\beta<1/2,∥ bold_Xm ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ italic_n start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT , for any fixed 1 / 4 < italic_β < 1 / 2 , (11)

where μ𝜇\muitalic_μ is to be specified later and 𝐞lsubscript𝐞𝑙{\mathbf{e}}_{l}bold_e start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the l𝑙litalic_lth vector in the canonical basis of q+1superscript𝑞1\mathbb{R}^{q+1}blackboard_R start_POSTSUPERSCRIPT italic_q + 1 end_POSTSUPERSCRIPT. From the above calculations, 𝐌𝐌{\mathbf{M}}bold_M is only a function of the design matrix 𝐗𝐗{\mathbf{X}}bold_X, and can be seen as an approximation to the inverse of 𝐗𝐗/nsuperscript𝐗top𝐗𝑛{\mathbf{X}}^{\top}{\mathbf{X}}/nbold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X / italic_n. Assuming a random design where 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are i.i.d sub-Gaussian, and Var(𝐱i)Varsubscript𝐱𝑖\text{Var}({\mathbf{x}}_{i})Var ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) have bounded eigenvalues and diagonal elements, Javanmard and Montanari (2014) showed the optimizations in (4.2) are feasible with high probability.

Define an index map** T:[p(p+1)/2][p]×[p]:𝑇delimited-[]𝑝𝑝12delimited-[]𝑝delimited-[]𝑝T:[p(p+1)/2]\rightarrow[p]\times[p]italic_T : [ italic_p ( italic_p + 1 ) / 2 ] → [ italic_p ] × [ italic_p ] such that, for m[p(p+1)/2]𝑚delimited-[]𝑝𝑝12m\in[p(p+1)/2]italic_m ∈ [ italic_p ( italic_p + 1 ) / 2 ] and a p×p𝑝𝑝p\times pitalic_p × italic_p symmetric matrix 𝐀𝐀{\mathbf{A}}bold_A, T(m)=(j,k)𝑇𝑚𝑗𝑘T(m)=(j,k)italic_T ( italic_m ) = ( italic_j , italic_k ) if [vech(𝐀)]m=Ajksubscriptdelimited-[]vech𝐀𝑚subscript𝐴𝑗𝑘[\text{vech}({\mathbf{A}})]_{m}=A_{jk}[ vech ( bold_A ) ] start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_A start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT. Next, we state our result on debiasing 𝐃^^𝐃\widehat{\mathbf{D}}over^ start_ARG bold_D end_ARG for inference.

Theorem 3

Suppose conditions in Theorem 2 are satisfied and optimizations in (4.2) are feasible. Assume 𝔼(ϵijk2)=σijk2𝔼superscriptsubscriptitalic-ϵ𝑖𝑗𝑘2superscriptsubscript𝜎𝑖𝑗𝑘2\mathbb{E}(\epsilon_{ijk}^{2})=\sigma_{ijk}^{2}blackboard_E ( italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and 𝔼(|ϵijk|2+a)<C2σijk2+a𝔼superscriptsubscriptitalic-ϵ𝑖𝑗𝑘2𝑎subscript𝐶2superscriptsubscript𝜎𝑖𝑗𝑘2𝑎\mathbb{E}(|\epsilon_{ijk}|^{2+a})<C_{2}\sigma_{ijk}^{2+a}blackboard_E ( | italic_ϵ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 + italic_a end_POSTSUPERSCRIPT ) < italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 + italic_a end_POSTSUPERSCRIPT for some a>0𝑎0a>0italic_a > 0 and C2>0subscript𝐶20C_{2}>0italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0.

  • (1)

    With probability at least 1{p(p+1)(q+1)}(1cA2)1superscript𝑝𝑝1𝑞11𝑐superscript𝐴21-\{p(p+1)(q+1)\}^{(1-cA^{2})}1 - { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } start_POSTSUPERSCRIPT ( 1 - italic_c italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT, 𝐃^usuperscript^𝐃𝑢\widehat{{\mathbf{D}}}^{u}over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT can be decomposed as

    n(𝐃^u𝐃)=𝚲+𝐖,𝑛superscript^𝐃𝑢superscript𝐃𝚲𝐖\displaystyle\sqrt{n}(\widehat{{\mathbf{D}}}^{u}-{\mathbf{D}}^{\ast})={\bm{% \Lambda}}+{\mathbf{W}},square-root start_ARG italic_n end_ARG ( over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT - bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = bold_Λ + bold_W ,

    where 𝚲=n1/2(𝐌𝚯^𝐈q+1)(𝐃𝐃^)𝚲superscript𝑛12𝐌^𝚯subscript𝐈𝑞1superscript𝐃^𝐃{\bm{\Lambda}}=n^{1/2}({\mathbf{M}}\widehat{{\bm{\Theta}}}-{\mathbf{I}}_{q+1})% ({\mathbf{D}}^{\ast}-\widehat{{\mathbf{D}}})bold_Λ = italic_n start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( bold_M over^ start_ARG bold_Θ end_ARG - bold_I start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT ) ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - over^ start_ARG bold_D end_ARG ) such that

    𝚲8(2+C)2KAκμslog{p(p+1)(q+1)},subscriptnorm𝚲8superscript2𝐶2𝐾𝐴𝜅𝜇𝑠𝑝𝑝1𝑞1\displaystyle\|{\bm{\Lambda}}\|_{\infty}\leq\frac{8(2+C)^{2}KA}{\kappa}\mu s% \sqrt{\log\{p(p+1)(q+1)\}},∥ bold_Λ ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ divide start_ARG 8 ( 2 + italic_C ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_K italic_A end_ARG start_ARG italic_κ end_ARG italic_μ italic_s square-root start_ARG roman_log { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } end_ARG ,

    and 𝐖=n1/2𝐌𝐗𝐄𝐖superscript𝑛12superscript𝐌𝐗top𝐄{\mathbf{W}}=n^{-1/2}{\mathbf{M}}{\mathbf{X}}^{\top}{\mathbf{E}}bold_W = italic_n start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_E such that Wlmsubscript𝑊𝑙𝑚W_{lm}italic_W start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT with T(m)=(j,k)𝑇𝑚𝑗𝑘T(m)=(j,k)italic_T ( italic_m ) = ( italic_j , italic_k ) is asymptotically normal with mean zero and variance n1i=1n(𝐌𝐗)li2σijk2superscript𝑛1superscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝐌𝐗top𝑙𝑖2superscriptsubscript𝜎𝑖𝑗𝑘2n^{-1}\sum_{i=1}^{n}({\mathbf{M}}{\mathbf{X}}^{\top})_{li}^{2}\sigma_{ijk}^{2}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

  • (2)

    When μ=log{p(p+1)(q+1)}/n𝜇𝑝𝑝1𝑞1𝑛\mu=\sqrt{\log\{p(p+1)(q+1)\}/n}italic_μ = square-root start_ARG roman_log { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } / italic_n end_ARG and slog{p(p+1)(q+1)}/n=o(1)𝑠𝑝𝑝1𝑞1𝑛𝑜1s\log\{p(p+1)(q+1)\}/\sqrt{n}=o(1)italic_s roman_log { italic_p ( italic_p + 1 ) ( italic_q + 1 ) } / square-root start_ARG italic_n end_ARG = italic_o ( 1 ), an asymptotic two-sided 100(1α)%100percent1𝛼100(1-\alpha)\%100 ( 1 - italic_α ) % confidence interval for Dlmsuperscriptsubscript𝐷𝑙𝑚D_{lm}^{\ast}italic_D start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, where T(m)=(j,k)𝑇𝑚𝑗𝑘T(m)=(j,k)italic_T ( italic_m ) = ( italic_j , italic_k ), is

    D^lmu±Φ1(1α/2)n1i=1n(𝐌𝐗)li2σijk2.plus-or-minussuperscriptsubscript^𝐷𝑙𝑚𝑢superscriptΦ11𝛼2superscript𝑛1superscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝐌𝐗top𝑙𝑖2superscriptsubscript𝜎𝑖𝑗𝑘2\displaystyle\widehat{D}_{lm}^{u}\pm\Phi^{-1}(1-\alpha/2)n^{-1}\sqrt{\sum_{i=1% }^{n}({\mathbf{M}}{\mathbf{X}}^{\top})_{li}^{2}\sigma_{ijk}^{2}}.over^ start_ARG italic_D end_ARG start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ± roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 - italic_α / 2 ) italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT square-root start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

In Theorem 3, the errors are not assumed to be Gaussian and they can be heteroskedastic. Calculating the above confidence interval requires estimating the variance of Wlmsubscript𝑊𝑙𝑚W_{lm}italic_W start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT, where T(m)=(j,k)𝑇𝑚𝑗𝑘T(m)=(j,k)italic_T ( italic_m ) = ( italic_j , italic_k ). This can be consistently estimated by the empirical variance (Bühlmann and Van de Geer, 2015)

1ni=1n{(𝐌𝐗)liϵ^ijk1nr=1n(𝐌𝐗)lrϵ^rjk}2.1𝑛superscriptsubscript𝑖1𝑛superscriptsubscriptsuperscript𝐌𝐗top𝑙𝑖subscript^italic-ϵ𝑖𝑗𝑘1𝑛superscriptsubscript𝑟1𝑛subscriptsuperscript𝐌𝐗top𝑙𝑟subscript^italic-ϵ𝑟𝑗𝑘2\displaystyle\frac{1}{n}\sum_{i=1}^{n}\bigg{\{}({\mathbf{M}}{\mathbf{X}}^{\top% })_{li}\hat{\epsilon}_{ijk}-\frac{1}{n}\sum_{r=1}^{n}({\mathbf{M}}{\mathbf{X}}% ^{\top})_{lr}\hat{\epsilon}_{rjk}\bigg{\}}^{2}.divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT { ( bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT over^ start_ARG italic_ϵ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( bold_MX start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT over^ start_ARG italic_ϵ end_ARG start_POSTSUBSCRIPT italic_r italic_j italic_k end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (12)

where ϵ^ijk=(𝐕𝐗𝐃^u)imsubscript^italic-ϵ𝑖𝑗𝑘subscript𝐕𝐗superscript^𝐃𝑢𝑖𝑚\hat{\epsilon}_{ijk}=({\mathbf{V}}-{\mathbf{X}}\widehat{{\mathbf{D}}}^{u})_{im}over^ start_ARG italic_ϵ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT = ( bold_V - bold_X over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_i italic_m end_POSTSUBSCRIPT with T(m)=(j,k)𝑇𝑚𝑗𝑘T(m)=(j,k)italic_T ( italic_m ) = ( italic_j , italic_k ).

5 Simulation Studies

In this section, we investigate the finite sample performance of our proposed method, referred to as SparseCovReg, and compare it with four alternative methods, including:
\bullet DenseSample: standard sample covariance estimator 𝐒=i=1n𝐲i𝐲i/n𝐒superscriptsubscript𝑖1𝑛subscript𝐲𝑖superscriptsubscript𝐲𝑖top𝑛{\mathbf{S}}=\sum_{i=1}^{n}{\mathbf{y}}_{i}{\mathbf{y}}_{i}^{\top}/nbold_S = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT / italic_n,
\bullet SparseSample: soft-thresholding sample covariance estimator Sλ(𝐒)subscript𝑆𝜆𝐒S_{\lambda}({\mathbf{S}})italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( bold_S ) where Sλ()subscript𝑆𝜆S_{\lambda}(\cdot)italic_S start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ( ⋅ ) is
the element-wise soft-thresholding operator at λ𝜆\lambdaitalic_λ (Rothman et al., 2009),
\bullet CovReg: quadratic covariance regression estimator in Hoff and Niu (2012),
\bullet DenseCovReg: multivariate regression estimator without sparsity, i.e., (𝐗𝐗)1𝐗𝐕superscriptsuperscript𝐗top𝐗1superscript𝐗top𝐕({\mathbf{X}}^{\top}{\mathbf{X}})^{-1}{\mathbf{X}}^{\top}{\mathbf{V}}( bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_V.
The tuning parameters in SparseCovReg and SparseSample are selected using 5-fold cross validation.

We simulate n𝑛nitalic_n samples {(𝐲i,𝐱i),i[n]}subscript𝐲𝑖subscript𝐱𝑖𝑖delimited-[]𝑛\{({\mathbf{y}}_{i},{\mathbf{x}}_{i}),i\in[n]\}{ ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i ∈ [ italic_n ] }, where the response 𝐲isubscript𝐲𝑖{\mathbf{y}}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is of dimension p𝑝pitalic_p (e.g., genes) and covariate 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is of dimension q𝑞qitalic_q (e.g., genetic variants). For 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s, we consider two simulation settings. In Setting 1, we consider continuous covariates drawn independently from Uniform(0,1)Uniform01\text{Uniform}(0,1)Uniform ( 0 , 1 ) and in Setting 2, we consider discrete covariates drawn independently from Bernoulli(0.5)Bernoulli0.5\text{Bernoulli}(0.5)Bernoulli ( 0.5 ). Given 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we simulate 𝐲isubscript𝐲𝑖{\mathbf{y}}_{i}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from Np(𝟎,𝚺(𝐱i))subscript𝑁𝑝0𝚺subscript𝐱𝑖N_{p}({\mathbf{0}},\mbox{\boldmath$\Sigma$}({\mathbf{x}}_{i}))italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( bold_0 , bold_Σ ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), where

Σjk(𝐱)={0.5+0.5x1,if j=k,0.5x1,if |jk|=1,0,if |jk|>1,subscriptΣ𝑗𝑘𝐱cases0.50.5subscript𝑥1if 𝑗𝑘0.5subscript𝑥1if 𝑗𝑘10if 𝑗𝑘1\Sigma_{jk}({\mathbf{x}})=\begin{cases}0.5+0.5x_{1},&\quad\text{if }j=k,\\ 0.5x_{1},&\quad\text{if }|j-k|=1,\\ 0,&\quad\text{if }|j-k|>1,\end{cases}roman_Σ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( bold_x ) = { start_ROW start_CELL 0.5 + 0.5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL start_CELL if italic_j = italic_k , end_CELL end_ROW start_ROW start_CELL 0.5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , end_CELL start_CELL if | italic_j - italic_k | = 1 , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if | italic_j - italic_k | > 1 , end_CELL end_ROW (13)

and Σjk(𝐱)subscriptΣ𝑗𝑘𝐱\Sigma_{jk}({\mathbf{x}})roman_Σ start_POSTSUBSCRIPT italic_j italic_k end_POSTSUBSCRIPT ( bold_x ) is the (j,k)𝑗𝑘(j,k)( italic_j , italic_k )th element in 𝚺(𝐱)𝚺𝐱\mbox{\boldmath$\Sigma$}({\mathbf{x}})bold_Σ ( bold_x ). The above MA(1) covariance structure has been commonly considered by others (Rothman et al., 2009; Qiu and Liyanage, 2019). We consider n=200,500𝑛200500n=200,500italic_n = 200 , 500, p=50𝑝50p=50italic_p = 50 and q=30,100𝑞30100q=30,100italic_q = 30 , 100. For each simulation configuration, we generate 100 independent data sets.

Let 𝚺isuperscriptsubscript𝚺𝑖\mbox{\boldmath$\Sigma$}_{i}^{\ast}bold_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denotes the true covariance matrix for the i𝑖iitalic_ith observation and 𝚺^isubscript^𝚺𝑖\widehat{\mbox{\boldmath$\Sigma$}}_{i}over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the estimated 𝚺isuperscriptsubscript𝚺𝑖\mbox{\boldmath$\Sigma$}_{i}^{\ast}bold_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from a given method. For illustration, Figure 1 plots the estimated Σ^i,12subscript^Σ𝑖12\widehat{\Sigma}_{i,12}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_i , 12 end_POSTSUBSCRIPT against Σi,12superscriptsubscriptΣ𝑖12\Sigma_{i,12}^{\ast}roman_Σ start_POSTSUBSCRIPT italic_i , 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for i[n]𝑖delimited-[]𝑛i\in[n]italic_i ∈ [ italic_n ] from 5 data replicates. We did not include the scatter plot from SparseSample as it is very similar to that of DenseSample. As DenseSample does not account for the effect of covariates, Σi,12superscriptsubscriptΣ𝑖12\Sigma_{i,12}^{\ast}roman_Σ start_POSTSUBSCRIPT italic_i , 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is estimated to be constant across all subjects, as shown by five horizontal lines from 5 data replicates. The CovReg method by Hoff and Niu (2012) cannot estimate the covariance well as the true covariance is not a quadratic function of the covariates. The DenseCovReg gives a reasonable agreement between the estimated and true covariances (slopes are all roughly 1), though the variability of the estimates is very high. On the other hand, the proposed SparseCovReg estimates the covariance well (slopes are all roughly 1) and enjoys a much reduced variability.

Refer to caption
Figure 1: Comparison of the true non-zero covariance Σi,12superscriptsubscriptΣ𝑖12\Sigma_{i,12}^{\ast}roman_Σ start_POSTSUBSCRIPT italic_i , 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (x-axis) and estimated covariance Σ^i,12subscript^Σ𝑖12\widehat{\Sigma}_{i,12}over^ start_ARG roman_Σ end_ARG start_POSTSUBSCRIPT italic_i , 12 end_POSTSUBSCRIPT (y-axis) for five simulated datasets under Setting 1 (continuous covariates) with the number of covariates q=30𝑞30q=30italic_q = 30 and the sample size n=500𝑛500n=500italic_n = 500.
n𝑛nitalic_n q𝑞qitalic_q method Setting 1 Setting 2
200 30 DenseSample 3.22 (0.08) 4.15 (0.07)
SparseSample 2.01 (0.06) 3.23 (0.06)
CovReg 7.75 (0.31) 8.60 (0.43)
DenseCovReg 15.09 (0.34) 15.64 (0.46)
SparseCovReg 1.84 (0.06) 2.17 (0.08)
100 DenseSample 3.21 (0.08) 4.13 (0.08)
SparseSample 2.01 (0.06) 3.22 (0.06)
CovReg 14.64 (1.74) 15.97 (2.20)
DenseCovReg 26.92 (0.60) 27.64 (0.79)
SparseCovReg 1.84 (0.08) 2.18 (0.10)
500 30 DenseSample 2.41 (0.04) 3.53 (0.03)
SparseSample 1.75 (0.04) 3.12 (0.03)
CovReg 4.35 (0.14) 5.18 (0.12)
DenseCovReg 9.55 (0.14) 9.93 (0.18)
SparseCovReg 1.28 (0.05) 1.42 (0.07)
100 DenseSample 2.39 (0.04) 3.52 (0.03)
SparseSample 1.75 (0.04) 3.11 (0.03)
CovReg 11.06 (0.34) 12.21 (0.43)
DenseCovReg 17.26 (0.28) 17.86 (0.36)
SparseCovReg 1.29 (0.06) 1.45 (0.07)
Table 1: Average estimation errors measured via 𝚺^i𝚺iFsubscriptnormsubscript^𝚺𝑖superscriptsubscript𝚺𝑖𝐹\|\widehat{\mbox{\boldmath$\Sigma$}}_{i}-\mbox{\boldmath$\Sigma$}_{i}^{\ast}\|% _{F}∥ over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT over 100 simulations with standard errors shown in parentheses.

Next, we compare the estimation errors calculated as n1i=1n𝚺^i𝚺iFsuperscript𝑛1superscriptsubscript𝑖1𝑛subscriptnormsubscript^𝚺𝑖superscriptsubscript𝚺𝑖𝐹n^{-1}\sum_{i=1}^{n}\|\widehat{\mbox{\boldmath$\Sigma$}}_{i}-\mbox{\boldmath$% \Sigma$}_{i}^{\ast}\|_{F}italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∥ over^ start_ARG bold_Σ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. Table 1 reports the average estimation errors with standard errors in the parentheses. The proposed SparseCovReg outperforms the alternative methods for all n𝑛nitalic_n and q𝑞qitalic_q. It is seen that the estimation errors of SparseCovReg increases with q𝑞qitalic_q and decreases with n𝑛nitalic_n, confirming the results of Theorem 2. In Table 2, we also report the selection accuracy of SparseCovReg by measuring the true positive rate and the false positive rate as

True positive rate =#{(h,j,k):B^h,jk0,Bh,jk0}#{(h,j,k):Bh,jk0}absent#conditional-set𝑗𝑘formulae-sequencesubscript^𝐵𝑗𝑘0superscriptsubscript𝐵𝑗𝑘0#conditional-set𝑗𝑘superscriptsubscript𝐵𝑗𝑘0\displaystyle=\frac{\#\{(h,j,k):\widehat{B}_{h,jk}\neq 0,\;B_{h,jk}^{\ast}\neq 0% \}}{\#\{(h,j,k):B_{h,jk}^{\ast}\neq 0\}}= divide start_ARG # { ( italic_h , italic_j , italic_k ) : over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT ≠ 0 , italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ 0 } end_ARG start_ARG # { ( italic_h , italic_j , italic_k ) : italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ 0 } end_ARG
False positive rate =#{(h,j,k):B^h,jk0,Bh,jk=0}#{(h,j,k):Bh,jk=0}.absent#conditional-set𝑗𝑘formulae-sequencesubscript^𝐵𝑗𝑘0superscriptsubscript𝐵𝑗𝑘0#conditional-set𝑗𝑘superscriptsubscript𝐵𝑗𝑘0\displaystyle=\frac{\#\{(h,j,k):\widehat{B}_{h,jk}\neq 0,\;B_{h,jk}^{\ast}=0\}% }{\#\{(h,j,k):B_{h,jk}^{\ast}=0\}}.= divide start_ARG # { ( italic_h , italic_j , italic_k ) : over^ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT ≠ 0 , italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 } end_ARG start_ARG # { ( italic_h , italic_j , italic_k ) : italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 } end_ARG .

Note that the selection accuracy cannot be fairly evaluated from other methods, as DenseSample, CovReg and DenseCovReg are all dense estimators, and SparseSample does not estimate Bh,jksubscript𝐵𝑗𝑘B_{h,jk}italic_B start_POSTSUBSCRIPT italic_h , italic_j italic_k end_POSTSUBSCRIPT for h[q]delimited-[]𝑞h\in[q]italic_h ∈ [ italic_q ].

n𝑛nitalic_n q𝑞qitalic_q Selection rate Setting 1 Setting 2
200 30 TPR 0.8880 0.9960
FPR 0.0145 0.0147
100 TPR 0.8749 0.9960
FPR 0.0042 0.0042
500 30 TPR 0.9919 0.9999
FPR 0.0155 0.0155
100 TPR 0.9926 1.0000
FPR 0.0044 0.0042
Table 2: True positive rate (TPR) and false positive rate (FPR) of SparseCovReg.
\begin{overpic}[width=426.39255pt,angle={0}]{figures/ciplot3.png} \put(48.0,-2.0){\small Index} \put(-2.0,18.0){\rotatebox{90.0}{\small{Coefficient}}} \end{overpic}
Figure 2: 95% confidence intervals for elements in 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from one data replicate under Setting 2 with q=100𝑞100q=100italic_q = 100, n=500𝑛500n=500italic_n = 500. True parameter values are shown in \circ and estimated parameter values, after debiasing, are shown in \bullet.
Setting 1 Setting 2
n𝑛nitalic_n q𝑞qitalic_q Method 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 𝒥1(𝐃)subscript𝒥1superscript𝐃\mathcal{J}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) 𝒥1c(𝐃)subscriptsuperscript𝒥𝑐1superscript𝐃\mathcal{J}^{c}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 𝒥1(𝐃)subscript𝒥1superscript𝐃\mathcal{J}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) 𝒥1c(𝐃)subscriptsuperscript𝒥𝑐1superscript𝐃\mathcal{J}^{c}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
200 30 σ^ijksubscript^𝜎𝑖𝑗𝑘\hat{\sigma}_{ijk}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT 0.938 0.928 0.938 0.927 0.910 0.927
σijksuperscriptsubscript𝜎𝑖𝑗𝑘\sigma_{ijk}^{\ast}italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 0.950 0.948 0.950 0.950 0.949 0.950
100 σ^ijksubscript^𝜎𝑖𝑗𝑘\hat{\sigma}_{ijk}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT 0.875 0.853 0.875 0.863 0.828 0.863
σijksuperscriptsubscript𝜎𝑖𝑗𝑘\sigma_{ijk}^{\ast}italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 0.951 0.950 0.951 0.951 0.935 0.951
500 30 σ^ijksubscript^𝜎𝑖𝑗𝑘\hat{\sigma}_{ijk}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT 0.943 0.937 0.943 0.942 0.939 0.942
σijksuperscriptsubscript𝜎𝑖𝑗𝑘\sigma_{ijk}^{\ast}italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 0.951 0.948 0.951 0.951 0.953 0.951
100 σ^ijksubscript^𝜎𝑖𝑗𝑘\hat{\sigma}_{ijk}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT 0.925 0.912 0.925 0.921 0.909 0.921
σijksuperscriptsubscript𝜎𝑖𝑗𝑘\sigma_{ijk}^{\ast}italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 0.951 0.947 0.951 0.951 0.951 0.951
Table 3: Average coverage probabilities of the 95% confidence intervals with the variance of Wlmsubscript𝑊𝑙𝑚W_{lm}italic_W start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT estimated by (12), referred to as σ^ijksubscript^𝜎𝑖𝑗𝑘\hat{\sigma}_{ijk}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT, and with the true variance of Wlmsubscript𝑊𝑙𝑚W_{lm}italic_W start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT, referred to as σijksuperscriptsubscript𝜎𝑖𝑗𝑘\sigma_{ijk}^{\ast}italic_σ start_POSTSUBSCRIPT italic_i italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Columns 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, 𝒥1(𝐃)subscript𝒥1superscript𝐃\mathcal{J}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and 𝒥1c(𝐃)subscriptsuperscript𝒥𝑐1superscript𝐃\mathcal{J}^{c}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) show average coverage probabilities of all off-diagonal parameters, non-zero off-diagonal parameters and zero off-diagonal parameters in 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, respectively.

Next, we evaluate the efficacy of the statistical inference procedure from Section 4.2. In Figure 2, we plot 95% confidence intervals for entries in 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from one data replicate under Setting 2 with q=100𝑞100q=100italic_q = 100, n=500𝑛500n=500italic_n = 500. Specifically, the first 49 confidence intervals in Figure 2 are shown for B1,jksuperscriptsubscript𝐵1𝑗𝑘B_{1,jk}^{\ast}italic_B start_POSTSUBSCRIPT 1 , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT for |jk|=1𝑗𝑘1|j-k|=1| italic_j - italic_k | = 1. From (13), we have B1,jk=0.5superscriptsubscript𝐵1𝑗𝑘0.5B_{1,jk}^{\ast}=0.5italic_B start_POSTSUBSCRIPT 1 , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0.5 for |jk|=1𝑗𝑘1|j-k|=1| italic_j - italic_k | = 1. The last 48 confidence intervals in Figure 2 are shown for |jk|=2𝑗𝑘2|j-k|=2| italic_j - italic_k | = 2. From (13), we have B1,jk=0superscriptsubscript𝐵1𝑗𝑘0B_{1,jk}^{\ast}=0italic_B start_POSTSUBSCRIPT 1 , italic_j italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 0 for |jk|=2𝑗𝑘2|j-k|=2| italic_j - italic_k | = 2. It is seen that the 95% confidence intervals show a good coverage. Finally, we evaluate the coverage probabilities for all off-diagonal parameters in 𝐃superscript𝐃{\mathbf{D}}^{\ast}bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, 𝒥1(𝐃)subscript𝒥1superscript𝐃\mathcal{J}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and 𝒥1c(𝐃)subscriptsuperscript𝒥𝑐1superscript𝐃\mathcal{J}^{c}_{1}({\mathbf{D}}^{\ast})caligraphic_J start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_D start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), respectively, in Table 3. Under each n,q𝑛𝑞n,qitalic_n , italic_q setting, we report the average coverage probabilities when the confidence intervals are calculated using the true variances of Wlmsubscript𝑊𝑙𝑚W_{lm}italic_W start_POSTSUBSCRIPT italic_l italic_m end_POSTSUBSCRIPT’s and empirical variances estimated using (12). It is seen that the intervals calculated with empirical variances achieve a satisfactory coverage, and it approaches 95% as the sample size increases.

6 Real Data Analysis

We apply our proposed method SparseCovReg to the REMBRANDT study (GSE108476) that collects data on 178 patients with glioblastoma multiforme (GBM), the most common malignant form of brain tumor in adults and one of the most lethal of all cancers (Akhavan et al., 2010). These 178 patients had undergone microarray and single-nucleotide polymorphism (SNP) chip profiling, with both gene expression and SNP data available for analysis. The raw data were pre-processed and normalized using standard pipelines; see Gusev et al. (2018) for more details. The main objectives of our analysis are to identify co-expression QTLs and recover both the population-level and individual-level covariance matrices of gene expressions.

For response variables, we consider the expression levels of 73 genes that belong to the human glioma pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto, 2000). As covariates, we consider local SNPs (i.e., SNPs that fall within 2kb upstream and 0.5kb downstream of the gene) residing near those 73 genes, resulting in a total of 118 SNPs. SNPs are coded with “0” indicating homozygous in the major allele and “1” otherwise. Our analysis also includes age (continous) and sex as covariates, bringing a total of 120 covariates and 326,821 parameters in the model (4). Tuning parameters have been selected by 5-fold cross validation.

\begin{overpic}[width=224.03743pt,angle={0}]{figures/heat_ST_new} \put(25.0,104.0){\uline{\small{soft-thresholding estimator}}} \par\put(-1.0,84.5){\vector(0,4){6.5}} \put(-1.0,84.5){\vector(0,-4){6.5}} \put(-7.0,82.0){\rotatebox{90.0}{\tiny{p53}}} \put(-1.0,68.0){\vector(0,4){9.5}} \put(-1.0,68.0){\vector(0,-4){10.0}} \put(-7.0,62.0){\rotatebox{90.0}{\tiny{calcium}}} \put(-1.0,50.5){\vector(0,4){7.0}} \put(-1.0,50.5){\vector(0,-4){7.0}} \put(-9.0,45.5){\rotatebox{90.0}{\tiny{PI3K/}}} \put(-5.0,45.0){\rotatebox{90.0}{\tiny{MTOR}}} \put(-1.0,31.5){\vector(0,4){11.5}} \put(-1.0,31.5){\vector(0,-4){11.5}} \put(-9.0,24.5){\rotatebox{90.0}{\tiny{Ras-Raf-}}} \put(-5.0,23.0){\rotatebox{90.0}{\tiny{MEK-ERK}}} \par\end{overpic} \begin{overpic}[width=224.03743pt,angle={0}]{figures/heat_pop_new} \put(10.0,104.0){\uline{\small{Population network from SparseCovReg}}} \end{overpic}

\begin{overpic}[width=224.03743pt,height=14.45377pt,angle={0}]{figures/heat_% legend1} \end{overpic} 0.40.0-0.4

Figure 3: Heatmaps of the population-level covariance estimates from SparseSample (left) and SparseCovReg (right). Positive values are shown in red and negative values are shown in blue.

We first investigate the population-level co-expression matrix. In Figure 3, we compare the soft-thresholding covariance estimator (Rothman et al., 2009) with the population level covariance 𝐁0subscript𝐁0{\mathbf{B}}_{0}bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT obtained from SparseCovReg. It is seen that the soft-thresholding estimator shares some common patterns with SparseCovReg but is considerably more noisy. Using the population covariance matrix from SparseCovReg, we can identify high correlations between PIK3CA and genes in the calcium signaling pathway including CALML5, CALM1, CAMK1D and CAMK2B. This is reasonable as mutations in PIK3CA have been reported in multiple tumor types and PIK3CA is part of the PI3K/AKT/MTOR signaling pathway, one of the core pathways in human GBM (Network et al., 2008). The calcium signaling pathway also plays diverse roles in the progression of brain cancers (Maklad et al., 2019). Figure 3 shows negative correlations between PIK3CA and genes in the p53 signaling pathway, another core pathway in human GBM (Network et al., 2008) and a potential target for inhibition in GBM treatments (Schröder and McDonald, 2015; Yin et al., 2021).

\begin{overpic}[width=224.03743pt,angle={0}]{figures/heat_18_rs10509346_new} \put(21.0,104.0){\uline{\small{Covariate effect ({rs10509346})}}} \par\put(-1.0,84.5){\vector(0,4){6.5}} \put(-1.0,84.5){\vector(0,-4){6.5}} \put(-7.0,82.0){\rotatebox{90.0}{\tiny{p53}}} \put(-1.0,68.0){\vector(0,4){9.5}} \put(-1.0,68.0){\vector(0,-4){10.0}} \put(-7.0,62.0){\rotatebox{90.0}{\tiny{calcium}}} \put(-1.0,50.5){\vector(0,4){7.0}} \put(-1.0,50.5){\vector(0,-4){7.0}} \put(-9.0,45.5){\rotatebox{90.0}{\tiny{PI3K/}}} \put(-5.0,45.0){\rotatebox{90.0}{\tiny{MTOR}}} \put(-1.0,31.5){\vector(0,4){11.5}} \put(-1.0,31.5){\vector(0,-4){11.5}} \put(-9.0,24.5){\rotatebox{90.0}{\tiny{Ras-Raf-}}} \put(-5.0,23.0){\rotatebox{90.0}{\tiny{MEK-ERK}}} \par\end{overpic} \begin{overpic}[width=224.03743pt,angle={0}]{figures/heat_3_rs6701524_new} \put(22.0,104.0){\uline{\small{Covariate effect ({rs6701524})}}} \end{overpic}

\begin{overpic}[width=224.03743pt,height=14.45377pt,angle={0}]{figures/heat_% legend1} \end{overpic} 0.40.0-0.4

Figure 4: Heatmaps of identified nonzero covariate effects. Positive values are shown in red and negative values are shown in blue.

Next, we examine the covariate effects on the covariance matrix. Non-zero effects have been identified for six SNPs: rs6701524, rs10509346, rs10519201, rs1347069, rs503314, and rs306098. The non-zero effects of rs10509346 and rs6701524 are shown in Figure 4 and their network effects after the debiased inference procedure are shown in Figure 5. Interestingly, these covariate effects are not easily observable from the soft-thresholding estimator in Figure 3, suggesting that, by fitting model (1), we may find some covariate-modulated co-expression patterns that can otherwise be overlooked.

From the left plot of Figure 5, it is seen that rs10509346, residing in CAMK2G, notably affect co-expressions among genes in the Ras-Raf-MEK-ERK signaling pathway including EGF, SHC4, RAF1 and MAPK3. Also, their co-expressions with CAMK2D and CALML5 in the calcium signaling pathway are affected by rs10509346. This agrees with the findings that the Ras-Raf-MEK-ERK pathway is modulated by Ca+22{}^{+2}start_FLOATSUPERSCRIPT + 2 end_FLOATSUPERSCRIPT and calmodulin (Agell et al., 2002; Zhang and Li, 2023). Furthermore, this SNP is found to affect the co-expressions of MTOR, part of the PI3K/AKT/MTOR pathway, with genes in the Ras-Raf-MEK-ERK pathway. This result is interesting because MTOR is a key mediator of PI3K/AKT/MTOR signaling, and is known to cooperate with alterations in other signaling pathways that are also commonly activated in GBM patients, such as the Ras-Raf-MEK-ERK pathway (Akhavan et al., 2010).

The right plot of Figure 5, shows rs6701524, residing in MTOR, affects co-expressions of genes in the PI3K/MTOR pathway. In particular, co-expressions of PIK3CD (and PIK3CB) with other genes are affected by this SNP. This is an interesting finding as PI3K/MTOR is a key pathway in the development and progression of GBM, and the inhibition of PI3K/MTOR signaling was found effective in increasing survival with GBM tumor (Batsios et al., 2019). Co-expressions affected by other SNPs are also worth noting. For example, rs306098 has been found to affect co-expressions of SHC2 with CDK4/6 and MTOR, which is interesting because the combination of CDK4/6 and MTOR inhibition has been investigated as a potential therapeutic strategy in GBM (Olmez et al., 2017).

\begin{overpic}[width=224.03743pt,angle={0}]{figures/net_18_rs10509346_new} \put(37.0,104.0){{\uline{\small{{rs10509346}}}}} \end{overpic} \begin{overpic}[width=224.03743pt,angle={0}]{figures/net_3_rs6701524_new} \put(38.0,104.0){{\uline{\small{{rs6701524}}}}} \end{overpic}



Figure 5: Network effects of rs10509346 (left) and rs6701524 (right) identified using the debiased inference procedure in Section 4.2 with significance level α=1%𝛼percent1\alpha=1\%italic_α = 1 % and Bonferroni correction 1α/{p(p1)/2}1𝛼𝑝𝑝121-\alpha/\{p(p-1)/2\}1 - italic_α / { italic_p ( italic_p - 1 ) / 2 }. Positive effects are shown in red and negative effects are shown in blue. Edge weights in the network graphs are proportional to the effect size.

7 Discussion

In our current approach, we did not consider the hierarchy between population-level effect and covariate-specific effect. However, in some applications, it might be reasonable to assume that covariate-specific effect is non-zero only when the population-level effect is non-zero. Our multivariate regression framework can be extended to accommodate such a hierarchical structure by modifying the penalty term (Tibshirani and Friedman, 2020; Kim et al., 2021).

Next, we comment on the model interpretability after scaling the covariates to be in [0,1]01[0,1][ 0 , 1 ]. Given bounded covariates xhsubscript𝑥x_{h}\in\mathbb{R}italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∈ blackboard_R for all h[q]delimited-[]𝑞h\in[q]italic_h ∈ [ italic_q ], denote LB(xh)𝐿𝐵subscript𝑥LB(x_{h})italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) and UB(xh)𝑈𝐵subscript𝑥UB(x_{h})italic_U italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) as the lower and upper bounds of xhsubscript𝑥x_{h}italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, respectively, and consider the covariance regression model

𝚺(𝐱)=𝐁0+h=1qxhLB(xh)UB(xh)LB(xh)𝐁h.𝚺𝐱subscript𝐁0superscriptsubscript1𝑞subscript𝑥𝐿𝐵subscript𝑥𝑈𝐵subscript𝑥𝐿𝐵subscript𝑥subscript𝐁\displaystyle\mbox{\boldmath$\Sigma$}({\mathbf{x}})={\mathbf{B}}_{0}+\sum_{h=1% }^{q}\frac{x_{h}-LB(x_{h})}{UB(x_{h})-LB(x_{h})}{\mathbf{B}}_{h}.bold_Σ ( bold_x ) = bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT divide start_ARG italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG start_ARG italic_U italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) - italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) end_ARG bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT .

The above equation can be rewritten as

𝚺(𝐱)=¯𝐁0+h=1qxh¯𝐁h.𝚺𝐱bold-¯absentsubscript𝐁0superscriptsubscript1𝑞subscript𝑥bold-¯absentsubscript𝐁\displaystyle\mbox{\boldmath$\Sigma$}({\mathbf{x}})=\bm{\bar{}}{{\mathbf{B}}}_% {0}+\sum_{h=1}^{q}x_{h}\bm{\bar{}}{{\mathbf{B}}}_{h}.bold_Σ ( bold_x ) = overbold_¯ start_ARG end_ARG bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT overbold_¯ start_ARG end_ARG bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT .

where ¯𝐁0=𝐁0h=1qLB(xh)/{UB(xh)LB(xh)}𝐁hbold-¯absentsubscript𝐁0subscript𝐁0superscriptsubscript1𝑞𝐿𝐵subscript𝑥𝑈𝐵subscript𝑥𝐿𝐵subscript𝑥subscript𝐁\bm{\bar{}}{{\mathbf{B}}}_{0}={\mathbf{B}}_{0}-\sum_{h=1}^{q}LB(x_{h})/\{UB(x_% {h})-LB(x_{h})\}{\mathbf{B}}_{h}overbold_¯ start_ARG end_ARG bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) / { italic_U italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) - italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) } bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and ¯𝐁h=𝐁h/{UB(xh)LB(xh)}bold-¯absentsubscript𝐁subscript𝐁𝑈𝐵subscript𝑥𝐿𝐵subscript𝑥\bm{\bar{}}{{\mathbf{B}}}_{h}={\mathbf{B}}_{h}/\{UB(x_{h})-LB(x_{h})\}overbold_¯ start_ARG end_ARG bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT / { italic_U italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) - italic_L italic_B ( italic_x start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) }. Note that ¯𝐁hbold-¯absentsubscript𝐁\bm{\bar{}}{{\mathbf{B}}}_{h}overbold_¯ start_ARG end_ARG bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and 𝐁hsubscript𝐁{\mathbf{B}}_{h}bold_B start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT only differ by a positive scalar, and they share the same sparsity pattern. Hence, parameter estimates can be interpreted with covariates before the transformation.

References

  • Agell et al. (2002) Agell, N., Bachs, O., Rocamora, N., and Villalonga, P. (2002), “Modulation of the Ras/Raf/MEK/ERK pathway by Ca2+, and calmodulin,” Cellular signalling, 14, 649–654.
  • Akhavan et al. (2010) Akhavan, D., Cloughesy, T. F., and Mischel, P. S. (2010), “mTOR signaling in glioblastoma: lessons learned from bench to bedside,” Neuro-oncology, 12, 882–889.
  • Alakus et al. (2022) Alakus, C., Larocque, D., and Labbe, A. (2022), “Covariance regression with random forests,” arXiv preprint arXiv:2209.08173.
  • Anderson (1973) Anderson, T. W. (1973), “Asymptotically efficient estimation of covariance matrices with linear structure,” The Annals of Statistics, 1, 135–141.
  • Argyriou et al. (2008) Argyriou, A., Evgeniou, T., and Pontil, M. (2008), “Convex multi-task feature learning,” Machine learning, 73, 243–272.
  • Batsios et al. (2019) Batsios, G., Viswanath, P., Subramani, E., Najac, C., Gillespie, A. M., Santos, R. D., Molloy, A. R., Pieper, R. O., and Ronen, S. M. (2019), “PI3K/mTOR inhibition of IDH1 mutant glioma leads to reduced 2HG production that is associated with increased survival,” Scientific reports, 9, 10521.
  • Bickel et al. (2008a) Bickel, P. J., Levina, E., et al. (2008a), “Covariance regularization by thresholding,” The Annals of Statistics, 36, 2577–2604.
  • Bickel et al. (2008b) — (2008b), “Regularized estimation of large covariance matrices,” The Annals of Statistics, 36, 199–227.
  • Bien and Tibshirani (2011) Bien, J. and Tibshirani, R. J. (2011), “Sparse estimation of a covariance matrix,” Biometrika, 98, 807–820.
  • Bühlmann and Van de Geer (2015) Bühlmann, P. and Van de Geer, S. (2015), “High-dimensional inference in misspecified linear models,” Electronic Journal of Statistics, 9, 1449–1473.
  • Butte et al. (2000) Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R., and Kohane, I. S. (2000), “Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks,” Proceedings of the National Academy of Sciences, 97, 12182–12186.
  • Cai et al. (2022) Cai, T. T., Zhang, A. R., and Zhou, Y. (2022), “Sparse group lasso: Optimal sample complexity, convergence rate, and statistical inference,” IEEE Transactions on Information Theory, 68, 5975–6002.
  • Chiu et al. (1996) Chiu, T. Y., Leonard, T., and Tsui, K.-W. (1996), “The matrix-logarithmic covariance model,” Journal of the American Statistical Association, 91, 198–210.
  • El Karoui et al. (2010) El Karoui, N. et al. (2010), “High-dimensionality effects in the Markowitz problem and other quadratic programs with linear constraints: Risk underestimation,” The Annals of Statistics, 38, 3487–3566.
  • Fox and Dunson (2015) Fox, E. B. and Dunson, D. B. (2015), “Bayesian nonparametric covariance regression,” The Journal of Machine Learning Research, 16, 2501–2542.
  • Franks (2021) Franks, A. M. (2021), “Reducing subspace models for large-scale covariance regression,” Biometrics.
  • Gardner et al. (2003) Gardner, T. S., Di Bernardo, D., Lorenz, D., and Collins, J. J. (2003), “Inferring genetic networks and identifying compound mode of action via expression profiling,” Science, 301, 102–105.
  • Gusev et al. (2018) Gusev, Y., Bhuvaneshwar, K., Song, L., Zenklusen, J.-C., Fine, H., and Madhavan, S. (2018), “The REMBRANDT study, a large collection of genomic data from brain cancer patients,” Scientific data, 5, 1–9.
  • Hastie et al. (2015) Hastie, T., Tibshirani, R., and Wainwright, M. (2015), “Statistical learning with sparsity,” Monographs on statistics and applied probability, 143, 143.
  • Hoff and Niu (2012) Hoff, P. D. and Niu, X. (2012), “A covariance regression model,” Statistica Sinica, 729–753.
  • Huang et al. (2006) Huang, J. Z., Liu, N., Pourahmadi, M., and Liu, L. (2006), “Covariance matrix selection and estimation via penalised normal likelihood,” Biometrika, 93, 85–98.
  • Javanmard and Montanari (2014) Javanmard, A. and Montanari, A. (2014), “Confidence intervals and hypothesis testing for high-dimensional regression,” The Journal of Machine Learning Research, 15, 2869–2909.
  • Kanehisa and Goto (2000) Kanehisa, M. and Goto, S. (2000), “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic acids research, 28, 27–30.
  • Kim et al. (2021) Kim, R., Mueller, S., and Garcia, T. P. (2021), “svReg: Structural varying-coefficient regression to differentiate how regional brain atrophy affects motor impairment for Huntington disease severity groups,” Biometrical Journal, 63, 1254–1271.
  • Lam and Fan (2009) Lam, C. and Fan, J. (2009), “Sparsistency and rates of convergence in large covariance matrix estimation,” Annals of statistics, 37, 4254.
  • Langfelder and Horvath (2008) Langfelder, P. and Horvath, S. (2008), “WGCNA: an R package for weighted correlation network analysis,” BMC bioinformatics, 9, 1–13.
  • Ledoit and Wolf (2004) Ledoit, O. and Wolf, M. (2004), “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of multivariate analysis, 88, 365–411.
  • Li et al. (2015) Li, Y., Nan, B., and Zhu, J. (2015), ‘‘Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure,” Biometrics, 71, 354–363.
  • Li et al. (2010) Li, Y., Wang, N., and Carroll, R. J. (2010), “Generalized functional linear models with semiparametric single-index interactions,” Journal of the American Statistical Association, 105, 621–633.
  • Maklad et al. (2019) Maklad, A., Sharma, A., and Azimi, I. (2019), “Calcium signaling in brain cancers: roles and therapeutic targeting,” Cancers, 11, 145.
  • Network et al. (2008) Network et al. (2008), “Comprehensive genomic characterization defines human glioblastoma genes and core pathways,” Nature, 455, 1061–1068.
  • Olmez et al. (2017) Olmez, I., Brenneman, B., Xiao, A., Serbulea, V., Benamar, M., Zhang, Y., Manigat, L., Abbas, T., Lee, J., Nakano, I., et al. (2017), “Combined CDK4/6 and mTOR inhibition is synergistic against glioblastoma via multiple mechanisms,” Clinical Cancer Research, 23, 6958–6968.
  • Park (2023) Park, H. G. (2023), “Bayesian estimation of covariate assisted principal regression for brain functional connectivity,” arXiv preprint arXiv:2306.07181.
  • Pourahmadi (1999) Pourahmadi, M. (1999), “Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation,” Biometrika, 86, 677–690.
  • Qiu and Liyanage (2019) Qiu, Y. and Liyanage, J. S. (2019), “Threshold selection for covariance estimation,” Biometrics, 75, 895–905.
  • Rothman et al. (2009) Rothman, A. J., Levina, E., and Zhu, J. (2009), “Generalized thresholding of large covariance matrices,” Journal of the American Statistical Association, 104, 177–186.
  • Schröder and McDonald (2015) Schröder, L. B. and McDonald, K. L. (2015), “CDK4/6 inhibitor PD0332991 in glioblastoma treatment: does it have a future?” Frontiers in oncology, 5, 259.
  • Simon et al. (2013) Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013), “A sparse-group lasso,” Journal of Computational and Graphical Statistics, 22, 231–245.
  • Su et al. (2023) Su, C., Xu, Z., Shan, X., Cai, B., Zhao, H., and Zhang, J. (2023), “Cell-type-specific co-expression inference from single cell RNA-sequencing data,” Nature Communications, 14, 4846.
  • Tibshirani and Friedman (2020) Tibshirani, R. and Friedman, J. (2020), “A pliable lasso,” Journal of Computational and Graphical Statistics, 29, 215–225.
  • Van Der Wijst et al. (2018) Van Der Wijst, M. G., de Vries, D. H., Brugge, H., Westra, H.-J., and Franke, L. (2018), “An integrative approach for building personalized gene regulatory networks for precision medicine,” Genome medicine, 10, 1–15.
  • Vierstra et al. (2020) Vierstra, J., Lazar, J., Sandstrom, R., Halow, J., Lee, K., Bates, D., Diegel, M., Dunn, D., Neri, F., Haugen, E., et al. (2020), ‘‘Global reference map** of human transcription factor footprints,” Nature, 583, 729–736.
  • Wu and Pourahmadi (2003) Wu, W. B. and Pourahmadi, M. (2003), “Nonparametric estimation of large covariance matrices of longitudinal data,” Biometrika, 90, 831–844.
  • Xue et al. (2012) Xue, L., Ma, S., and Zou, H. (2012), “Positive-definite \ellroman_ℓ1-penalized estimation of large covariance matrices,” Journal of the American Statistical Association, 107, 1480–1491.
  • Yin et al. (2021) Yin, L., Yao, Z., Wang, Y., Huang, J., Mazuranic, M., and Yin, A. (2021), “In Preclinical evaluation of novel CDK4/6 inhibitor GLR2007 in glioblastoma models,” J. Clin. Oncol, 39, e14023.
  • Yuan and Lin (2006) Yuan, M. and Lin, Y. (2006), “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49–67.
  • Zhang and Zhang (2014) Zhang, C.-H. and Zhang, S. S. (2014), ‘‘Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B: Statistical Methodology, 217–242.
  • Zhang and Li (2023) Zhang, J. and Li, Y. (2023), “High-dimensional Gaussian graphical regression models with covariates,” Journal of the American Statistical Association, 118, 2088–2100.
  • Zhang et al. (2020) Zhang, J., Sun, W. W., and Li, L. (2020), “Mixed-effect time-varying network model and application in brain connectivity analysis,” Journal of the American Statistical Association, 115, 2022–2036.
  • Zhang et al. (2023) Zhang, J., Sun, W. W., and Li, L. (2023), “Generalized connectivity matrix response regression with applications in brain connectivity studies,” Journal of Computational and Graphical Statistics, 32, 252–262.
  • Zhang and Zhao (2023) Zhang, J. and Zhao, H. (2023), “eQTL studies: from bulk tissues to single cells,” Journal of Genetics and Genomics, 50, 925–933.
  • Zhao et al. (2021) Zhao, Y., Wang, B., Mostofsky, S. H., Caffo, B. S., and Luo, X. (2021), “Covariate assisted principal regression for covariance matrix outcomes,” Biostatistics, 22, 629–645.
  • Zou et al. (2017) Zou, T., Lan, W., Wang, H., and Tsai, C.-L. (2017), “Covariance regression analysis,” Journal of the American Statistical Association, 112, 266–281.