Search | arXiv e-print repository

doi 10.1214/18-AOS1716

Estimation of Large Covariance and Precision Matrices from Temporally Dependent Observations

Abstract: We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the tempo… ▽ More We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_1$ minimization and the $\ell_1$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence. △ Less

Submitted 18 July, 2017; v1 submitted 16 December, 2014; originally announced December 2014.

Comments: The result for banding estimator of covariance matrix is given in the version 2 of this article. See arXiv:1412.5059v2

Journal ref: The Annals of Statistics, 2019, 47(3): 1321-1350

arXiv:1307.8369 [pdf, ps, other]

Estimating mean survival time: when is it possible?

Authors: Ying Ding, Bin Nan

Abstract: For right censored survival data, it is well known that the mean survival time can be consistently estimated when the support of the censoring time contains the support of the survival time. In practice, however, this condition can be easily violated because the follow-up of a study is usually within a finite window. In this article we show that the mean survival time is still estimable from a lin… ▽ More For right censored survival data, it is well known that the mean survival time can be consistently estimated when the support of the censoring time contains the support of the survival time. In practice, however, this condition can be easily violated because the follow-up of a study is usually within a finite window. In this article we show that the mean survival time is still estimable from a linear model when the support of some covariate(s) with nonzero coefficient(s) is unbounded regardless of the length of follow-up. This implies that the mean survival time can be well estimated when the covariate range is wide in practice. The theoretical finding is further verified for finite samples by simulation studies. Simulations also show that, when both models are correctly specified, the linear model yields reasonable mean square prediction errors and outperforms the Cox model, particularly with heavy censoring and short follow-up time. △ Less

Submitted 31 July, 2013; originally announced July 2013.

Comments: 31 pages, 3 Postscript figures

arXiv:1204.2579 [pdf, ps, other]

A general semiparametric Z-estimation approach for case-cohort studies

Authors: Bin Nan, Jon A. Wellner

Abstract: Case-cohort design, an outcome-dependent sampling design for censored survival data, is increasingly used in biomedical research. The development of asymptotic theory for a case-cohort design in the current literature primarily relies on counting process stochastic integrals. Such an approach, however, is rather limited and lacks theoretical justification for outcome-dependent weighted methods due… ▽ More Case-cohort design, an outcome-dependent sampling design for censored survival data, is increasingly used in biomedical research. The development of asymptotic theory for a case-cohort design in the current literature primarily relies on counting process stochastic integrals. Such an approach, however, is rather limited and lacks theoretical justification for outcome-dependent weighted methods due to non-predictability. Instead of stochastic integrals, we derive asymptotic properties for case-cohort studies based on a general Z-estimation theory for semiparametric models with bundled parameters using modern empirical processes. Both the Cox model and the additive hazards model with time-dependent covariates are considered. △ Less

Submitted 11 April, 2012; originally announced April 2012.

Comments: 25 pages

arXiv:1204.1992 [pdf, ps, other]

Non-asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso

Authors: Shengchun Kong, Bin Nan

Abstract: We consider the finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored surviva… ▽ More We consider the finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz. We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulty caused by the lack of iid and Lipschitz property. △ Less

Submitted 9 April, 2012; originally announced April 2012.

Comments: 18 pages

arXiv:1203.2470 [pdf, ps, other]

doi 10.1214/11-AOS934

A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data

Authors: Ying Ding, Bin Nan

Abstract: In many semiparametric models that are parameterized by two types of parameters---a Euclidean parameter of interest and an infinite-dimensional nuisance parameter---the two parameters are bundled together, that is, the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the… ▽ More In many semiparametric models that are parameterized by two types of parameters---a Euclidean parameter of interest and an infinite-dimensional nuisance parameter---the two parameters are bundled together, that is, the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by develo** an efficient estimating method for the regression parameters, we propose a general sieve M-theorem for bundled parameters and apply the theorem to deriving the asymptotic theory for the sieve maximum likelihood estimation in the linear regression model for censored survival data. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the Newton--Raphson algorithm. We show that the proposed estimator is consistent and asymptotically normal and achieves the semiparametric efficiency bound. Simulation studies demonstrate that the proposed method performs well in practical settings and yields more efficient estimates than existing estimating equation based methods. Illustration with a real data example is also provided. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS934 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS934

Journal ref: Annals of Statistics 2011, Vol. 39, No. 6, 3032-3061

arXiv:0908.3135 [pdf, ps, other]

doi 10.1214/08-AOS657

Asymptotic theory for the semiparametric accelerated failure time model with missing data

Authors: Bin Nan, John D. Kalbfleisch, Menggang Yu

Abstract: We consider a class of doubly weighted rank-based estimating methods for the transformation (or accelerated failure time) model with missing data as arise, for example, in case-cohort studies. The weights considered may not be predictable as required in a martingale stochastic process formulation. We treat the general problem as a semiparametric estimating equation problem and provide proofs of… ▽ More We consider a class of doubly weighted rank-based estimating methods for the transformation (or accelerated failure time) model with missing data as arise, for example, in case-cohort studies. The weights considered may not be predictable as required in a martingale stochastic process formulation. We treat the general problem as a semiparametric estimating equation problem and provide proofs of asymptotic properties for the weighted estimators, with either true weights or estimated weights, by using empirical process theory where martingale theory may fail. Simulations show that the outcome-dependent weighted method works well for finite samples in case-cohort studies and improves efficiency compared to methods based on predictable weights. Further, it is seen that the method is even more efficient when estimated weights are used, as is commonly the case in the missing data literature. The Gehan censored data Wilcoxon weights are found to be surprisingly efficient in a wide class of problems. △ Less

Submitted 21 August, 2009; originally announced August 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOS657 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS657 MSC Class: 62E20; 62N01 (Primary) 62D05 (Secondary)

Journal ref: Annals of Statistics 2009, Vol. 37, No. 5A, 2351-2376

arXiv:math/0406452 [pdf, ps, other]

doi 10.1214/009053604000000157

Information bounds for Cox regression models with missing data

Authors: Bin Nan, Mary J. Emond, Jon A. Wellner

Abstract: We derive information bounds for the regression parameters in Cox models when data are missing at random. These calculations are of interest for understanding the behavior of efficient estimation in case-cohort designs, a type of two-phase design often used in cohort studies. The derivations make use of key lemmas appearing in Robins, Rotnitzky and Zhao [J. Amer. Statist. Assoc. 89 (1994) 846-86… ▽ More We derive information bounds for the regression parameters in Cox models when data are missing at random. These calculations are of interest for understanding the behavior of efficient estimation in case-cohort designs, a type of two-phase design often used in cohort studies. The derivations make use of key lemmas appearing in Robins, Rotnitzky and Zhao [J. Amer. Statist. Assoc. 89 (1994) 846-866] and Robins, Hsieh and Newey [J. Roy. Statist. Soc. Ser. B 57 (1995) 409-424], but in a form suited for our purposes here. We begin by summarizing the results of Robins, Rotnitzky and Zhao in a form that leads directly to the projection method which will be of use for our model of interest. We then proceed to derive new information bounds for the regression parameters of the Cox model with data Missing At Random (MAR). In the final section we exemplify our calculations with several models of interest in cohort studies, including an i.i.d. version of the classical case-cohort design of Prentice [Biometrika 73 (1986) 1-11] △ Less

Submitted 23 June, 2004; originally announced June 2004.

Report number: IMS-AOS-AOS191 MSC Class: 62E17 (Primary) 65D20 (Secondary)

Journal ref: Annals of Statistics 2004, Vol. 32, No. 2, 723-753

Showing 1–7 of 7 results for author: Nan, B