Error Controlled Feature Selection for Ultrahigh Dimensional and Highly Correlated Feature Space Using Deep Learning
Authors:
Arkaprabha Ganguli,
David Todem,
Tapabrata Maiti
Abstract:
In recent years, deep learning has been at the center of analytics due to its impressive empirical success in analyzing complex data objects. Despite this success, most of the existing tools behave like black-box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning has emerged…
▽ More
In recent years, deep learning has been at the center of analytics due to its impressive empirical success in analyzing complex data objects. Despite this success, most of the existing tools behave like black-box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning has emerged as a promising tool in this realm. However, the recent developments do not accommodate ultra-high dimensional and highly correlated features, in addition to the high noise level. In this article, we propose a novel screening and cleaning method with the aid of deep learning for a data-adaptive multi-resolutional discovery of highly correlated predictors with a controlled error rate. Extensive empirical evaluations over a wide range of simulated scenarios and several real datasets demonstrate the effectiveness of the proposed method in achieving high power while kee** the false discovery rate at a minimum.
△ Less
Submitted 31 October, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.
Nonparametric Scanning For Nonrandom Missing Data With Continuous Instrumental Variables
Authors:
Arkaprabha Ganguli,
David Todem
Abstract:
This article introduces a new instrumental variable approach for estimating unknown population parameters with data having nonrandom missing values. With coarse and discrete instruments, Shao and Wang (2016) proposed a semiparametric method that uses the added information to identify the tilting parameter from the missing data propensity model. A naive application of this idea to continuous instru…
▽ More
This article introduces a new instrumental variable approach for estimating unknown population parameters with data having nonrandom missing values. With coarse and discrete instruments, Shao and Wang (2016) proposed a semiparametric method that uses the added information to identify the tilting parameter from the missing data propensity model. A naive application of this idea to continuous instruments through arbitrary discretizations is apt to be inefficient, and maybe questionable in some settings. We propose a nonparametric method not requiring arbitrary discretizations but involves scanning over continuous dichotomizations of the instrument; and combining scan statistics to estimate the unknown parameters via weighted integration.
We establish the asymptotic normality of the proposed integrated estimator and that of the underlying scan processes uniformly across the instrument sample space. Simulation studies and the analysis of a real data set demonstrate the gains of the methodology over procedures that rely either on arbitrary discretizations or moments of the instrument.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.