Search | arXiv e-print repository

Optimal Functional Bilinear Regression with Two-way Functional Covariates via Reproducing Kernel Hilbert Space

Authors: Dan Yang, Jianlong Shao, Haipeng Shen, Dong Wang, Hongtu Zhu

Abstract: Traditional functional linear regression usually takes a one-dimensional functional predictor as input and estimates the continuous coefficient function. Modern applications often generate two-dimensional covariates, which become matrices when observed at grid points. To avoid the inefficiency of the classical method involving estimation of a two-dimensional coefficient function, we propose a func… ▽ More Traditional functional linear regression usually takes a one-dimensional functional predictor as input and estimates the continuous coefficient function. Modern applications often generate two-dimensional covariates, which become matrices when observed at grid points. To avoid the inefficiency of the classical method involving estimation of a two-dimensional coefficient function, we propose a functional bilinear regression model, and introduce an innovative three-term penalty to impose roughness penalty in the estimation. The proposed estimator exhibits minimax optimal property for prediction under the framework of reproducing kernel Hilbert space. An iterative generalized cross-validation approach is developed to choose tuning parameters, which significantly improves the computational efficiency over the traditional cross-validation approach. The statistical and computational advantages of the proposed method over existing methods are further demonstrated via simulated experiments, the Canadian weather data, and a biochemical long-range infrared light detection and ranging data. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 48 pages, 19 figures

arXiv:2306.10213 [pdf, other]

A General Form of Covariate Adjustment in Randomized Clinical Trials

Authors: Marlena S. Bannick, Jun Shao, **gyi Liu, Yu Du, Yanyao Yi, Ting Ye

Abstract: In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted (AIPW) estimator, which is a general form of covariate adjustment that uses linear, generalized linear, and non-parametric or machine learning models for the conditional mean of th… ▽ More In randomized clinical trials, adjusting for baseline covariates can improve credibility and efficiency for demonstrating and quantifying treatment effects. This article studies the augmented inverse propensity weighted (AIPW) estimator, which is a general form of covariate adjustment that uses linear, generalized linear, and non-parametric or machine learning models for the conditional mean of the response given covariates. Under covariate-adaptive randomization, we establish general theorems that show a complete picture of the asymptotic normality, {efficiency gain, and applicability of AIPW estimators}. In particular, we provide for the first time a rigorous theoretical justification of using machine learning methods with cross-fitting for dependent data under covariate-adaptive randomization. Based on the general theorems, we offer insights on the conditions for guaranteed efficiency gain and universal applicability {under different randomization schemes}, which also motivate a joint calibration strategy using some constructed covariates after applying AIPW. Our methods are implemented in the R package RobinCar. △ Less

Submitted 25 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2302.10404 [pdf, ps, other]

Robust Variance Estimation for Covariate-Adjusted Unconditional Treatment Effect in Randomized Clinical Trials with Binary Outcomes

Authors: Ting Ye, Marlena Bannick, Yanyao Yi, Jun Shao

Abstract: To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used… ▽ More To improve precision of estimation and power of testing hypothesis for an unconditional treatment effect in randomized clinical trials with binary outcomes, researchers and regulatory agencies recommend using g-computation as a reliable method of covariate adjustment. However, the practical application of g-computation is hindered by the lack of an explicit robust variance formula that can be used for different unconditional treatment effects of interest. To fill this gap, we provide explicit and robust variance estimators for g-computation estimators and demonstrate through simulations that the variance estimators can be reliably applied in practice. △ Less

Submitted 27 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2201.11948 [pdf, other]

Covariate-Adjusted Log-Rank Test: Guaranteed Efficiency Gain and Universal Applicability

Authors: Ting Ye, Jun Shao, Yanyao Yi

Abstract: Nonparametric covariate adjustment is considered for log-rank type tests of treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability… ▽ More Nonparametric covariate adjustment is considered for log-rank type tests of treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability in the sense that the same formula of test can be universally applied to simple randomization and all commonly used covariate-adaptive randomization schemes such as the stratified permuted block and Pocock and Simon's minimization, which is not a property enjoyed by the unadjusted log-rank test. Our method is supported by novel asymptotic theory and empirical results for type I error and power of tests. △ Less

Submitted 19 January, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2108.13009 [pdf, ps, other]

Communication-Computation Efficient Device-Edge Co-Inference via AutoML

Authors: Xinjie Zhang, Jiawei Shao, Yuyi Mao, Jun Zhang

Abstract: Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsi… ▽ More Device-edge co-inference, which partitions a deep neural network between a resource-constrained mobile device and an edge server, recently emerges as a promising paradigm to support intelligent mobile applications. To accelerate the inference process, on-device model sparsification and intermediate feature compression are regarded as two prominent techniques. However, as the on-device model sparsity level and intermediate feature compression ratio have direct impacts on computation workload and communication overhead respectively, and both of them affect the inference accuracy, finding the optimal values of these hyper-parameters brings a major challenge due to the large search space. In this paper, we endeavor to develop an efficient algorithm to determine these hyper-parameters. By selecting a suitable model split point and a pair of encoder/decoder for the intermediate feature vector, this problem is casted as a sequential decision problem, for which, a novel automated machine learning (AutoML) framework is proposed based on deep reinforcement learning (DRL). Experiment results on an image classification task demonstrate the effectiveness of the proposed framework in achieving a better communication-computation trade-off and significant inference speedup against various baseline schemes. △ Less

Submitted 31 August, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

arXiv:2107.05545 [pdf, other]

Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

Authors: Kaixin Wang, Kuangqi Zhou, Qixin Zhang, Jie Shao, Bryan Hooi, Jiashi Feng

Abstract: The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward s… ▽ More The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward sha**. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward sha**. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: ICML 2021

arXiv:2010.10814 [pdf, other]

Improving Generalization in Reinforcement Learning with Mixture Regularization

Authors: Kaixin Wang, Bingyi Kang, Jie Shao, Jiashi Feng

Abstract: Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random convolution) are previously explored to increase the data diversity. However, we find these approaches only locally perturb the observations regardle… ▽ More Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random convolution) are previously explored to increase the data diversity. However, we find these approaches only locally perturb the observations regardless of the training environments, showing limited effectiveness on enhancing the data diversity and the generalization performance. In this work, we introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision (e.g. associated reward) interpolations. Mixreg increases the data diversity more effectively and helps learn smoother policies. We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin. Mixreg is simple, effective and general. It can be applied to both policy-based and value-based RL algorithms. Code is available at https://github.com/kaixin96/mixreg . △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020

arXiv:2009.11828 [pdf, other]

Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials

Authors: Ting Ye, Jun Shao, Yanyao Yi, Qingyuan Zhao

Abstract: In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better p… ▽ More In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better practice when model-assisted inference is applied to adjust for covariates under simple or covariate-adaptive randomized trials: (1) guaranteed efficiency gain: a model-assisted method should often gain but never hurt efficiency; (2) wide applicability: a valid procedure should be applicable, and preferably universally applicable, to all commonly used randomization schemes; (3) robust standard error: variance estimation should be robust to model misspecification and heteroscedasticity. To achieve these, we recommend a model-assisted estimator under an analysis of heterogeneous covariance working model including all covariates utilized in randomization. Our conclusions are based on an asymptotic theory that provides a clear picture of how covariate-adaptive randomization and regression adjustment alter statistical efficiency. Our theory is more general than the existing ones in terms of studying arbitrary functions of response means (including linear contrasts, ratios, and odds ratios), multiple arms, guaranteed efficiency gain, optimality, and universal applicability. △ Less

Submitted 13 July, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

arXiv:2007.09576 [pdf, other]

Inference on Average Treatment Effect under Minimization and Other Covariate-Adaptive Randomization Methods

Authors: Ting Ye, Yanyao Yi, Jun Shao

Abstract: Covariate-adaptive randomization schemes such as the minimization and stratified permuted blocks are often applied in clinical trials to balance treatment assignments across prognostic factors. The existing theoretical developments on inference after covariate-adaptive randomization are mostly limited to situations where a correct model between the response and covariates can be specified or the r… ▽ More Covariate-adaptive randomization schemes such as the minimization and stratified permuted blocks are often applied in clinical trials to balance treatment assignments across prognostic factors. The existing theoretical developments on inference after covariate-adaptive randomization are mostly limited to situations where a correct model between the response and covariates can be specified or the randomization method has well-understood properties. Based on stratification with covariate levels utilized in randomization and a further adjusting for covariates not used in randomization, in this article we propose several estimators for model free inference on average treatment effect defined as the difference between response means under two treatments. We establish asymptotic normality of the proposed estimators under all popular covariate-adaptive randomization schemes including the minimization whose theoretical property is unclear, and we show that the asymptotic distributions are invariant with respect to covariate-adaptive randomization methods. Consistent variance estimators are constructed for asymptotic inference. Asymptotic relative efficiencies and finite sample properties of estimators are also studied. We recommend using one of our proposed estimators for valid and model free inference after covariate-adaptive randomization. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2006.02166 [pdf, ps, other]

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

Authors: Jiawei Shao, Jun Zhang

Abstract: The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inferenc… ▽ More The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods. △ Less

Submitted 14 October, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2002.09843 [pdf, other]

An Accuracy-Lossless Perturbation Method for Defending Privacy Attacks in Federated Learning

Authors: Xue Yang, Yan Feng, Weijun Fang, Jun Shao, Xiaohu Tang, Shu-Tao Xia, Rongxing Lu

Abstract: Although federated learning improves privacy of training data by exchanging local gradients or parameters rather than raw data, the adversary still can leverage local gradients and parameters to obtain local training data by launching reconstruction and membership inference attacks. To defend such privacy attacks, many noises perturbation methods (like differential privacy or CountSketch matrix) h… ▽ More Although federated learning improves privacy of training data by exchanging local gradients or parameters rather than raw data, the adversary still can leverage local gradients and parameters to obtain local training data by launching reconstruction and membership inference attacks. To defend such privacy attacks, many noises perturbation methods (like differential privacy or CountSketch matrix) have been widely designed. However, the strong defence ability and high learning accuracy of these schemes cannot be ensured at the same time, which will impede the wide application of FL in practice (especially for medical or financial institutions that require both high accuracy and strong privacy guarantee). To overcome this issue, in this paper, we propose \emph{an efficient model perturbation method for federated learning} to defend reconstruction and membership inference attacks launched by curious clients. On the one hand, similar to the differential privacy, our method also selects random numbers as perturbed noises added to the global model parameters, and thus it is very efficient and easy to be integrated in practice. Meanwhile, the random selected noises are positive real numbers and the corresponding value can be arbitrarily large, and thus the strong defence ability can be ensured. On the other hand, unlike differential privacy or other perturbation methods that cannot eliminate the added noises, our method allows the server to recover the true gradients by eliminating the added noises. Therefore, our method does not hinder learning accuracy at all. △ Less

Submitted 15 August, 2021; v1 submitted 23 February, 2020; originally announced February 2020.

arXiv:1911.09802 [pdf, other]

Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization

Authors: Ting Ye, Jun Shao, Hyunseung Kang

Abstract: Mendelian randomization (MR) has become a popular approach to study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables. A challenge in MR is that each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments. To this end, we provide a theoretical c… ▽ More Mendelian randomization (MR) has become a popular approach to study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables. A challenge in MR is that each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments. To this end, we provide a theoretical characterization of the statistical properties of two popular estimators in MR, the inverse-variance weighted (IVW) estimator and the IVW estimator with screened instruments using an independent selection dataset, under many weak instruments. We then propose a debiased IVW estimator, a simple modification of the IVW estimator, that is robust to many weak instruments and doesn't require screening. Additionally, we present two instrument selection methods to improve the efficiency of the new estimator when a selection dataset is available. An extension of the debiased IVW estimator to handle balanced horizontal pleiotropy is also discussed. We conclude by demonstrating our results in simulated and real datasets. △ Less

Submitted 10 October, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

arXiv:1911.09290 [pdf, other]

Large-scale Multi-view Subspace Clustering in Linear Time

Authors: Zhao Kang, Wangtao Zhou, Zhitong Zhao, Junming Shao, Meng Han, Zenglin Xu

Abstract: A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue b… ▽ More A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI 2020

arXiv:1910.14315 [pdf, ps, other]

BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems

Authors: Jiawei Shao, Jun Zhang

Abstract: The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point n… ▽ More The emergence of various intelligent mobile applications demands the deployment of powerful deep learning models at resource-constrained mobile devices. The device-edge co-inference framework provides a promising solution by splitting a neural network at a mobile device and an edge computing server. In order to balance the on-device computation and the communication overhead, the splitting point needs to be carefully picked, while the intermediate feature needs to be compressed before transmission. Existing studies decoupled the design of model splitting, feature compression, and communication, which may lead to excessive resource consumption of the mobile device. In this paper, we introduce an end-to-end architecture, named BottleNet++, that consists of an encoder, a non-trainable channel layer, and a decoder for more efficient feature compression and transmission. The encoder and decoder essentially implement joint source-channel coding via convolutional neural networks (CNNs), while explicitly considering the effect of channel noise. By exploiting the strong sparsity and the fault-tolerant property of the intermediate feature in a deep neural network (DNN), BottleNet++ achieves a much higher compression ratio than existing methods. Furthermore, by providing the channel condition to the encoder as an input, our method enjoys a strong generalization ability in different channel conditions. Compared with merely transmitting intermediate data without feature compression, BottleNet++ achieves up to 64x bandwidth reduction over the additive white Gaussian noise channel and up to 256x bit compression ratio in the binary erasure channel, with less than 2% reduction in accuracy. With a higher compression ratio, BottleNet++ enables splitting a DNN at earlier layers, which leads to up to 3x reduction in on-device computation compared with other compression methods. △ Less

Submitted 5 June, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

arXiv:1909.03585 [pdf, other]

Learning to Sample: an Active Learning Framework

Authors: **gyu Shao, Qing Wang, Fangbing Liu

Abstract: Meta-learning algorithms for active learning are emerging as a promising paradigm for learning the ``best'' active learning strategy. However, current learning-based active learning approaches still require sufficient training data so as to generalize meta-learning models for active learning. This is contrary to the nature of active learning which typically starts with a small number of labeled sa… ▽ More Meta-learning algorithms for active learning are emerging as a promising paradigm for learning the ``best'' active learning strategy. However, current learning-based active learning approaches still require sufficient training data so as to generalize meta-learning models for active learning. This is contrary to the nature of active learning which typically starts with a small number of labeled samples. The unavailability of large amounts of labeled samples for training meta-learning models would inevitably lead to poor performance (e.g., instabilities and overfitting). In our paper, we tackle these issues by proposing a novel learning-based active learning framework, called Learning To Sample (LTS). This framework has two key components: a sampling model and a boosting model, which can mutually learn from each other in iterations to improve the performance of each other. Within this framework, the sampling model incorporates uncertainty sampling and diversity sampling into a unified process for optimization, enabling us to actively select the most representative and informative samples based on an optimized integration of uncertainty and diversity. To evaluate the effectiveness of the LTS framework, we have conducted extensive experiments on three different classification tasks: image classification, salary level prediction, and entity resolution. The experimental results show that our LTS framework significantly outperforms all the baselines when the label budget is limited, especially for datasets with highly imbalanced classes. In addition to this, our LTS framework can effectively tackle the cold start problem occurring in many existing active learning approaches. △ Less

Submitted 8 September, 2019; originally announced September 2019.

Comments: Accepted by ICDM'19

arXiv:1806.01845 [pdf, other]

Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex

Authors: Hongyang Zhang, Junru Shao, Ruslan Salakhutdinov

Abstract: Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap. The duality g… ▽ More Several recently proposed architectures of neural networks such as ResNeXt, Inception, Xception, SqueezeNet and Wide ResNet are based on the designing idea of having multiple branches and have demonstrated improved performance in many applications. We show that one cause for such success is due to the fact that the multi-branch architecture is less non-convex in terms of duality gap. The duality gap measures the degree of intrinsic non-convexity of an optimization problem: smaller gap in relative value implies lower degree of intrinsic non-convexity. The challenge is to quantitatively measure the duality gap of highly non-convex problems such as deep neural networks. In this work, we provide strong guarantees of this quantity for two classes of network architectures. For the neural networks with arbitrary activation functions, multi-branch architecture and a variant of hinge loss, we show that the duality gap of both population and empirical risks shrinks to zero as the number of branches increases. This result sheds light on better understanding the power of over-parametrization where increasing the network width tends to make the loss surface less non-convex. For the neural networks with linear activation function and $\ell_2$ loss, we show that the duality gap of empirical risk is zero. Our two results work for arbitrary depths and adversarial data, while the analytical techniques might be of independent interest to non-convex optimization more broadly. Experiments on both synthetic and real-world datasets validate our results. △ Less

Submitted 21 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 26 pages, 6 figures, 3 tables; v2 fixes some typos. arXiv admin note: text overlap with arXiv:1712.08559 by other authors

arXiv:1805.03353 [pdf, other]

Nonparametric Estimation of Conditional Expectation with Auxiliary Information and Dimension Reduction

Authors: Bingying Xie, Jun Shao

Abstract: Nonparametric estimation of the conditional expectation $E(Y | U)$ of an outcome $Y$ given a covariate vector $U$ is of primary importance in many statistical applications such as prediction and personalized medicine. In some problems, there is an additional auxiliary variable $Z$ in the training dataset used to construct estimators, but $Z$ is not available for future prediction or selecting pati… ▽ More Nonparametric estimation of the conditional expectation $E(Y | U)$ of an outcome $Y$ given a covariate vector $U$ is of primary importance in many statistical applications such as prediction and personalized medicine. In some problems, there is an additional auxiliary variable $Z$ in the training dataset used to construct estimators, but $Z$ is not available for future prediction or selecting patient treatment in personalized medicine. For example, in the training dataset longitudinal outcomes are observed, but only the last outcome $Y$ is concerned in the future prediction or analysis. The longitudinal outcomes other than the last point is then the variable $Z$ that is observed and related with both $Y$ and $U$. Previous work on how to make use of $Z$ in the estimation of $E(Y|U)$ mainly focused on using $Z$ in the construction of a linear function of $U$ to reduce covariate dimension for better estimation. Using $E(Y|U) = E\{E(Y|U, Z)| U\}$, we propose a two-step estimation of inner and outer expectations, respectively, with sufficient dimension reduction for kernel estimation in both steps. The information from $Z$ is utilized not only in dimension reduction, but also directly in the estimation. Because of the existence of different ways for dimension reduction, we construct two estimators that may improve the estimator without using $Z$. The improvements are shown in the convergence rate of estimators as the sample size increases to infinity as well as in the finite sample simulation performance. A real data analysis about the selection of mammography intervention is presented for illustration. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1802.09667 [pdf, other]

Sufficient variable screening via directional regression with censored response

Authors: Menghao Xu, Zhou Yu, Jun Shao

Abstract: We in this paper propose a directional regression based approach for ultrahigh dimensional sufficient variable screening with censored responses. The new method is designed in a model-free manner and thus can be adapted to various complex model structures. Under some commonly used assumptions, we show that the proposed method enjoys the sure screening property when the dimension p diverges at an e… ▽ More We in this paper propose a directional regression based approach for ultrahigh dimensional sufficient variable screening with censored responses. The new method is designed in a model-free manner and thus can be adapted to various complex model structures. Under some commonly used assumptions, we show that the proposed method enjoys the sure screening property when the dimension p diverges at an exponential rate of the sample size n. To improve the marginal screening method, the corresponding iterative screening algorithm and stability screening algorithm are further equipped. We demonstrate the effectiveness of the proposed method through simulation studies and a real data analysis. △ Less

Submitted 26 February, 2018; originally announced February 2018.

arXiv:1607.00448 [pdf, other]

Estimation and prediction of credit risk based on rating transition systems

Authors: **ghai Shao, Siming Li, Yong Li

Abstract: Risk management is an important practice in the banking industry. In this paper we develop a new methodology to estimate and predict the probability of default (PD) based on the rating transition matrices, which relates the rating transition matrices to the macroeconomic variables. Our method can overcome the shortcomings of the framework of Belkin et al. (1998), and is especially useful in predic… ▽ More Risk management is an important practice in the banking industry. In this paper we develop a new methodology to estimate and predict the probability of default (PD) based on the rating transition matrices, which relates the rating transition matrices to the macroeconomic variables. Our method can overcome the shortcomings of the framework of Belkin et al. (1998), and is especially useful in predicting the PD and doing stress testing. Simulation is conducted at the end, which shows that our method can provide more accurate estimate than that obtained by the method of Belkin et al. (1998). △ Less

Submitted 27 March, 2018; v1 submitted 1 July, 2016; originally announced July 2016.

Comments: 15 pages

arXiv:0903.0481 [pdf, ps, other]

doi 10.1214/07-AOS578

A pseudo empirical likelihood approach for stratified samples with nonresponse

Authors: Fang Fang, Quan Hong, Jun Shao

Abstract: Nonresponse is common in surveys. When the response probability of a survey variable $Y$ depends on $Y$ through an observed auxiliary categorical variable $Z$ (i.e., the response probability of $Y$ is conditionally independent of $Y$ given $Z$), a simple method often used in practice is to use $Z$ categories as imputation cells and construct estimators by imputing nonrespondents or reweighting r… ▽ More Nonresponse is common in surveys. When the response probability of a survey variable $Y$ depends on $Y$ through an observed auxiliary categorical variable $Z$ (i.e., the response probability of $Y$ is conditionally independent of $Y$ given $Z$), a simple method often used in practice is to use $Z$ categories as imputation cells and construct estimators by imputing nonrespondents or reweighting respondents within each imputation cell. This simple method, however, is inefficient when some $Z$ categories have small sizes and ad hoc methods are often applied to collapse small imputation cells. Assuming a parametric model on the conditional probability of $Z$ given $Y$ and a nonparametric model on the distribution of $Y$, we develop a pseudo empirical likelihood method to provide more efficient survey estimators. Our method avoids any ad hoc collapsing small $Z$ categories, since reweighting or imputation is done across $Z$ categories. Asymptotic distributions for estimators of population means based on the pseudo empirical likelihood method are derived. For variance estimation, we consider a bootstrap procedure and its consistency is established. Some simulation results are provided to assess the finite sample performance of the proposed estimators. △ Less

Submitted 3 March, 2009; originally announced March 2009.

Comments: Published in at http://dx.doi.org/10.1214/07-AOS578 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS578 MSC Class: 62D05 (Primary) 62G20; 62G99 (Secondary)

Journal ref: Annals of Statistics 2009, Vol. 37, No. 1, 371-393

Showing 1–20 of 20 results for author: Shao, J