Search | arXiv e-print repository

Benchmarking mortality risk prediction from electrocardiograms

Authors: Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K. Triedman, Sunil J. Ghelani, William G. La Cava

Abstract: Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These da… ▽ More Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These datasets now provide an excellent resource for a broader audience to explore ECG survival modeling. Here, we benchmark survival model performance on Code-15 and MIMIC-IV with two neural network architectures, compare four deep survival modeling approaches to Cox regressions trained on classifier outputs, and evaluate performance at one to ten years. Our results yield AUROC and concordance scores comparable to past work (circa 0.8) and reasonable AUPRC scores (MIMIC-IV: 0.4-0.5, Code-15: 0.05-0.13) considering the fraction of ECG samples linked to a mortality (MIMIC-IV: 27\%, Code-15: 4\%). When evaluating models on the opposite dataset, AUROC and concordance values drop by 0.1-0.15, which may be due to cohort differences. All code and results are made public. △ Less

Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 9 pages plus appendix, 2 figures

arXiv:2406.10917 [pdf, other]

Bayesian Intervention Optimization for Causal Discovery

Authors: Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

Abstract: Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is… ▽ More Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is not directly related to hypothesis testing. We propose a novel Bayesian optimization-based method inspired by Bayes factors that aims to maximize the probability of obtaining decisive and correct evidence. Our approach uses observational data to estimate causal models under different hypotheses, evaluates potential interventions pre-experimentally, and iteratively updates priors to refine interventions. We demonstrate the effectiveness of our method through various experiments. Our contributions provide a robust framework for efficient causal discovery through active interventions, enhancing the practical application of theoretical advancements. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2405.19231 [pdf, other]

Covariate Shift Corrected Conditional Randomization Test

Authors: Bowen Xu, Yiwen Huang, Chuan Hong, Shuangning Li, Molei Liu

Abstract: Conditional independence tests are crucial across various disciplines in determining the independence of an outcome variable $Y$ from a treatment variable $X$, conditioning on a set of confounders $Z$. The Conditional Randomization Test (CRT) offers a powerful framework for such testing by assuming known distributions of $X \mid Z$; it controls the Type-I error exactly, allowing for the use of fle… ▽ More Conditional independence tests are crucial across various disciplines in determining the independence of an outcome variable $Y$ from a treatment variable $X$, conditioning on a set of confounders $Z$. The Conditional Randomization Test (CRT) offers a powerful framework for such testing by assuming known distributions of $X \mid Z$; it controls the Type-I error exactly, allowing for the use of flexible, black-box test statistics. In practice, testing for conditional independence often involves using data from a source population to draw conclusions about a target population. This can be challenging due to covariate shift -- differences in the distribution of $X$, $Z$, and surrogate variables, which can affect the conditional distribution of $Y \mid X, Z$ -- rendering traditional CRT approaches invalid. To address this issue, we propose a novel Covariate Shift Corrected Pearson Chi-squared Conditional Randomization (csPCR) test. This test adapts to covariate shifts by integrating importance weights and employing the control variates method to reduce variance in the test statistics and thus enhance power. Theoretically, we establish that the csPCR test controls the Type-I error asymptotically. Empirically, through simulation studies, we demonstrate that our method not only maintains control over Type-I errors but also exhibits superior power, confirming its efficacy and practical utility in real-world scenarios where covariate shifts are prevalent. Finally, we apply our methodology to a real-world dataset to assess the impact of a COVID-19 treatment on the 90-day mortality rate among patients. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18722 [pdf, other]

Adaptive and Efficient Learning with Blockwise Missing and Semi-Supervised Data

Authors: Yiming Li, Xuehan Yang, Ying Wei, Molei Liu

Abstract: Data fusion is an important way to realize powerful and generalizable analyses across multiple sources. However, different capability of data collection across the sources has become a prominent issue in practice. This could result in the blockwise missingness (BM) of covariates troublesome for integration. Meanwhile, the high cost of obtaining gold-standard labels can cause the missingness of res… ▽ More Data fusion is an important way to realize powerful and generalizable analyses across multiple sources. However, different capability of data collection across the sources has become a prominent issue in practice. This could result in the blockwise missingness (BM) of covariates troublesome for integration. Meanwhile, the high cost of obtaining gold-standard labels can cause the missingness of response on a large proportion of samples, known as the semi-supervised (SS) problem. In this paper, we consider a challenging scenario confronting both the BM and SS issues, and propose a novel Data-adaptive projecting Estimation approach for data FUsion in the SEmi-supervised setting (DEFUSE). Starting with a complete-data-only estimator, it involves two successive projection steps to reduce its variance without incurring bias. Compared to existing approaches, DEFUSE achieves a two-fold improvement. First, it leverages the BM labeled sample more efficiently through a novel data-adaptive projection approach robust to model misspecification on the missing covariates, leading to better variance reduction. Second, our method further incorporates the large unlabeled sample to enhance the estimation efficiency through imputation and projection. Compared to the previous SS setting with complete covariates, our work reveals a more essential role of the unlabeled sample in the BM setting. These advantages are justified in asymptotic and simulation studies. We also apply DEFUSE for the risk modeling and inference of heart diseases with the MIMIC-III electronic medical record (EMR) data. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.15920 [pdf, other]

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Authors: Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specif… ▽ More This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.16173

arXiv:2405.02881 [pdf, other]

FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients

Authors: Zhuohua Li, Maoli Liu, John C. S. Lui

Abstract: Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based… ▽ More Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based federated conversational bandit algorithm, where $M$ agents collaboratively solve a global contextual linear bandit problem with the help of a central server while ensuring secure data management. To effectively coordinate all the clients and aggregate their collected data, FedConPE uses an adaptive approach to construct key terms that minimize uncertainty across all dimensions in the feature space. Furthermore, compared with existing federated linear bandit algorithms, FedConPE offers improved computational and communication efficiency as well as enhanced privacy protections. Our theoretical analysis shows that FedConPE is minimax near-optimal in terms of cumulative regret. We also establish upper bounds for communication costs and conversation frequency. Comprehensive evaluations demonstrate that FedConPE outperforms existing conversational bandit algorithms while using fewer conversations. △ Less

Submitted 20 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

arXiv:2404.01191 [pdf, other]

A Semiparametric Approach for Robust and Efficient Learning with Biobank Data

Authors: Molei Liu, Xinyi Wang, Chuan Hong

Abstract: With the increasing availability of electronic health records (EHR) linked with biobank data for translational research, a critical step in realizing its potential is to accurately classify phenotypes for patients. Existing approaches to achieve this goal are based on error-prone EHR surrogate outcomes, assisted and validated by a small set of labels obtained via medical chart review, which may al… ▽ More With the increasing availability of electronic health records (EHR) linked with biobank data for translational research, a critical step in realizing its potential is to accurately classify phenotypes for patients. Existing approaches to achieve this goal are based on error-prone EHR surrogate outcomes, assisted and validated by a small set of labels obtained via medical chart review, which may also be subject to misclassification. Ignoring the noise in these outcomes can induce severe estimation and validation bias to both EHR phenoty** and risking modeling with biomarkers collected in the biobank. To overcome this challenge, we propose a novel unsupervised and semiparametric approach to jointly model multiple noisy EHR outcomes with their linked biobank features. Our approach primarily aims at disease risk modeling with the baseline biomarkers, and is also able to produce a predictive EHR phenoty** model and validate its performance without observations of the true disease outcome. It consists of composite and nonparametric regression steps free of any parametric model specification, followed by a parametric projection step to reduce the uncertainty and improve the estimation efficiency. We show that our method is robust to violations of the parametric assumptions while attaining the desirable root-$n$ convergence rates on risk modeling. Our developed method outperforms existing methods in extensive simulation studies, as well as a real-world application in phenoty** and genetic risk modeling of type II diabetes. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.05281 [pdf, other]

An Efficient Quasi-Random Sampling for Copulas

Authors: Sumin Wang, Chenxian Huang, Yongdao Zhou, Min-Qian Liu

Abstract: This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such… ▽ More This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such as Generative Adversarial Networks (GANs), to generate quasi-random samples for any copula. GANs are a type of implicit generative models used to learn the distribution of complex data, thus facilitating easy sampling. In our study, GANs are employed to learn the map** from a uniform distribution to copulas. Once this map** is learned, obtaining quasi-random samples from the copula only requires inputting quasi-random samples from the uniform distribution. This approach offers a more flexible method for any copula. Additionally, we provide theoretical analysis of quasi-Monte Carlo estimators based on quasi-random samples of copulas. Through simulated and practical applications, particularly in the field of risk management, we validate the proposed method and demonstrate its superiority over various existing methods. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 42 pages, 5 figures

arXiv:2401.04900 [pdf, other]

SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation

Authors: Mengmeng Zhang, Fan Wu, Yude Bu, Shanshan Li, Zhen** Yi, Meng Liu, Xiaoming Kong

Abstract: The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr… ▽ More The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel framework, Spectral Transformer (SPT), to predict the age and mass of red giants aligned with asteroseismology from their spectra. A key component of SPT, the Multi-head Hadamard Self-Attention mechanism, designed specifically for spectra, can capture complex relationships across different wavelength. Further, we introduced a Mahalanobis distance-based loss function to address scale imbalance and interaction mode loss, and incorporated Monte Carlo dropout for quantitative analysis of prediction uncertainty.Trained and tested on 3,880 red giant spectra from LAMOST, the SPT achieved remarkable age and mass estimations with average percentage errors of 17.64% and 6.61%, respectively, and provided uncertainties for each corresponding prediction. The results significantly outperform those of traditional machine learning algorithms and demonstrate a high level of consistency with asteroseismology methods and isochrone fitting techniques. In the future, our work will leverage datasets from the Chinese Space Station Telescope and the Large Synoptic Survey Telescope to enhance the precision of the model and broaden its applicability in the field of astronomy and astrophysics. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by A&A

arXiv:2311.15031 [pdf, other]

Robust and Efficient Semi-supervised Learning for Ising Model

Authors: Daiqing Wu, Molei Liu

Abstract: In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic healt… ▽ More In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic health records (EHR). Semi-supervised learning (SSL) leverages the large unlabeled sample with auxiliary EHR features to assist the learning with labeled data only and is a potential solution to this issue. In this paper, we develop a novel SSL method for efficient inference of Ising model. Our method first models the outcomes against the auxiliary features, then uses it to project the score function of the supervised estimator onto the EHR features, and incorporates the unlabeled sample to augment the supervised estimator for variance reduction without introducing bias. For the key step of conditional modeling, we propose strategies that can effectively leverage the auxiliary EHR information while maintaining moderate model complexity. In addition, we introduce approaches including intrinsic efficient updates and ensemble, to overcome the potential misspecification of the conditional model that may cause efficiency loss. Our method is justified by asymptotic theory and shown to outperform existing SSL methods through simulation studies. We also illustrate its utility in a real example about several key phenotypes related to frequent ICU admission on MIMIC-III data set. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2310.11724 [pdf, other]

Simultaneous Nonparametric Inference of M-regression under Complex Temporal Dynamics

Authors: Miaoshiqi Liu, Zhou Zhou

Abstract: The paper considers simultaneous nonparametric inference for a wide class of M-regression models with time-varying coefficients. The covariates and errors of the regression model are tackled as a general class of nonstationary time series and are allowed to be cross-dependent. We construct $\sqrt{n}$-consistent inference for the cumulative regression function, whose limiting properties are disclos… ▽ More The paper considers simultaneous nonparametric inference for a wide class of M-regression models with time-varying coefficients. The covariates and errors of the regression model are tackled as a general class of nonstationary time series and are allowed to be cross-dependent. We construct $\sqrt{n}$-consistent inference for the cumulative regression function, whose limiting properties are disclosed using Bahadur representation and Gaussian approximation theory. A simple and unified self-convolved bootstrap procedure is proposed. With only one tuning parameter, the bootstrap consistently simulates the desired limiting behavior of the M-estimators under complex temporal dynamics, even under the possible presence of breakpoints in time series. Our methodology leads to a unified framework to conduct general classes of Exact Function Tests, Lack-of-fit Tests, and Qualitative Tests for the time-varying coefficients under complex temporal dynamics. These tests enable one to, among many others, conduct variable selection procedures, check for constancy and linearity, as well as verify shape assumptions, including monotonicity and convexity. As applications, our method is utilized to study the time-varying properties of global climate data and Microsoft stock return, respectively. △ Less

Submitted 26 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2309.17283 [pdf, other]

The Blessings of Multiple Treatments and Outcomes in Treatment Effect Estimation

Authors: Yong Wu, Mingzhou Liu, **g Yan, Yanwei Fu, Shouyan Wang, Yizhou Wang, Xinwei Sun

Abstract: Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focu… ▽ More Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focus on a single outcome, whereas in many real-world scenarios, there is greater interest in studying the effects on multiple outcomes. Besides, these outcomes are often coupled with multiple treatments. Examples include the intensive care unit (ICU), where health providers evaluate the effectiveness of therapies on multiple health indicators. To accommodate these scenarios, we consider a new setting dubbed as multiple treatments and multiple outcomes. We then show that parallel studies of multiple outcomes involved in this setting can assist each other in causal identification, in the sense that we can exploit other treatments and outcomes as proxies for each treatment effect under study. We proceed with a causal discovery method that can effectively identify such proxies for causal estimation. The utility of our method is demonstrated in synthetic data and sepsis disease. △ Less

Submitted 14 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: Preprint, under review

arXiv:2309.08923 [pdf, ps, other]

Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs

Authors: Liuqing Yang, Yongdao Zhou, Haoda Fu, Min-Qian Liu, Wei Zheng

Abstract: Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learn… ▽ More Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, its heavy computational burden has been long recognized but rarely investigated. Specifically, in a $d$-player coalition game, calculating a Shapley value requires the evaluation of $d!$ or $2^d$ marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when $d$ is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2307.01389 [pdf, other]

Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2305.19802 [pdf, other]

Neuro-Causal Factor Analysis

Authors: Alex Markham, Mingyu Liu, Bryon Aragam, Liam Solus

Abstract: Factor analysis (FA) is a statistical tool for studying how observed variables with some mutual dependences can be expressed as functions of mutually independent unobserved factors, and it is widely applied throughout the psychological, biological, and physical sciences. We revisit this classic method from the comparatively new perspective given by advancements in causal discovery and deep learnin… ▽ More Factor analysis (FA) is a statistical tool for studying how observed variables with some mutual dependences can be expressed as functions of mutually independent unobserved factors, and it is widely applied throughout the psychological, biological, and physical sciences. We revisit this classic method from the comparatively new perspective given by advancements in causal discovery and deep learning, introducing a framework for Neuro-Causal Factor Analysis (NCFA). Our approach is fully nonparametric: it identifies factors via latent causal discovery methods and then uses a variational autoencoder (VAE) that is constrained to abide by the Markov factorization of the distribution with respect to the learned graph. We evaluate NCFA on real and synthetic data sets, finding that it performs comparably to standard VAEs on data reconstruction tasks but with the advantages of sparser architecture, lower model complexity, and causal interpretability. Unlike traditional FA methods, our proposed NCFA method allows learning and reasoning about the latent factors underlying observed data from a justifiably causal perspective, even when the relations between factors and measurements are highly nonlinear. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 23 pages, 13 figures

arXiv:2305.15759 [pdf, other]

Differentially Private Latent Diffusion Models

Authors: Saiyue Lyu, Michael F. Liu, Margarita Vinaroz, Mijung Park

Abstract: Diffusion models (DMs) are widely used for generating high-quality high-dimensional images in a non-differentially private manner. To address this challenge, recent papers suggest pre-training DMs with public data, then fine-tuning them with private data using DP-SGD for a relatively short period. In this paper, we further improve the current state of DMs with DP by adopting the Latent Diffusion M… ▽ More Diffusion models (DMs) are widely used for generating high-quality high-dimensional images in a non-differentially private manner. To address this challenge, recent papers suggest pre-training DMs with public data, then fine-tuning them with private data using DP-SGD for a relatively short period. In this paper, we further improve the current state of DMs with DP by adopting the Latent Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. In our algorithm, DP-LDMs, rather than fine-tuning the entire DMs, we fine-tune only the attention modules of LDMs at varying layers with privacy-sensitive data, reducing the number of trainable parameters by roughly 90% and achieving a better accuracy, compared to fine-tuning the entire DMs. The smaller parameter space to fine-tune with DP-SGD helps our algorithm to achieve new state-of-the-art results in several public-private benchmark data pairs.Our approach also allows us to generate more realistic, high-dimensional images (256x256) and those conditioned on text prompts with differential privacy, which have not been attempted before us, to the best of our knowledge. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality high-dimensional DP images. △ Less

Submitted 15 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.06584 [pdf, other]

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Authors: Mo Liu, Paul Grigas, Heyuan Liu, Zuo-Jun Max Shen

Abstract: We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the de… ▽ More We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters, which is referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy and minimizes a tractable surrogate of the SPO loss on the collected data. In particular, we develop an efficient active learning algorithm with both hard and soft rejection variants, each with theoretical excess risk (i.e., generalization) guarantees. We further derive bounds on the label complexity, which refers to the number of samples whose labels are acquired to achieve a desired small level of SPO risk. Under some natural low-noise conditions, we show that these bounds can be better than the naive supervised learning approach that labels all samples. Furthermore, when using the SPO+ loss function, a specialized surrogate of the SPO loss, we derive a significantly smaller label complexity under separability conditions. We also present numerical evidence showing the practical value of our proposed algorithms in the settings of personalized pricing and the shortest path problem. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.05281 [pdf, other]

Causal Discovery via Conditional Independence Testing with Proxy Variables

Authors: Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

Abstract: Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caus… ▽ More Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caused by unobserveness. However, these methods were either limited to categorical variables or relied on strong parametric assumptions for identification. In this paper, we propose a novel hypothesis-testing procedure that can effectively examine the existence of the causal relationship over continuous variables, without any parametric constraint. Our procedure is based on discretization, which under completeness conditions, is able to asymptotically establish a linear equation whose coefficient vector is identifiable under the causal null hypothesis. Based on this, we introduce our test statistic and demonstrate its asymptotic level and power. We validate the effectiveness of our procedure using both synthetic and real-world data. △ Less

Submitted 1 May, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: ICML 2024

arXiv:2305.05276 [pdf, other]

Causal Discovery from Subsampled Time Series with Proxy Variables

Authors: Mingzhou Liu, Xinwei Sun, Ling**g Hu, Yizhou Wang

Abstract: Inferring causal structures from time series data is the central interest of many scientific inquiries. A major barrier to such inference is the problem of subsampling, i.e., the frequency of measurement is much lower than that of causal influence. To overcome this problem, numerous methods have been proposed, yet either was limited to the linear case or failed to achieve identifiability. In this… ▽ More Inferring causal structures from time series data is the central interest of many scientific inquiries. A major barrier to such inference is the problem of subsampling, i.e., the frequency of measurement is much lower than that of causal influence. To overcome this problem, numerous methods have been proposed, yet either was limited to the linear case or failed to achieve identifiability. In this paper, we propose a constraint-based algorithm that can identify the entire causal structure from subsampled time series, without any parametric constraint. Our observation is that the challenge of subsampling arises mainly from hidden variables at the unobserved time steps. Meanwhile, every hidden variable has an observed proxy, which is essentially itself at some observable time in the future, benefiting from the temporal structure. Based on these, we can leverage the proxies to remove the bias induced by the hidden variables and hence achieve identifiability. Following this intuition, we propose a proxy-based causal discovery algorithm. Our algorithm is nonparametric and can achieve full causal identification. Theoretical advantages are reflected in synthetic and real-world experiments. △ Less

Submitted 24 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2303.17791 [pdf]

Analysis of the current status of tuberculosis transmission in China based on a heterogeneity model

Authors: Chuanqing Xu, Kedeng Cheng, Yu Wang, Songbai Guo, Maoxing Liu, Xiao**g Wang, Zhiguo Zhang

Abstract: Tuberculosis (TB) is an infectious disease transmitted through the respiratory system. China is one of the countries with a high burden of TB. Since 2004, an average of more than 800,000 cases of active TB have been reported each year in China. Analyzing the case data from 2004-2018, we find significant differences in TB incidence by age group. Therefore, the effect of age heterogeneous structure… ▽ More Tuberculosis (TB) is an infectious disease transmitted through the respiratory system. China is one of the countries with a high burden of TB. Since 2004, an average of more than 800,000 cases of active TB have been reported each year in China. Analyzing the case data from 2004-2018, we find significant differences in TB incidence by age group. Therefore, the effect of age heterogeneous structure on TB transmission needs further study. We develop a model of TB to explore the role of age heterogeneity as a factor in TB transmission. The model is fitted numerically using the nonlinear least squares method to obtain the key parameters in the model, and the basic reproduction number Rv 0.8017 is calculated and the sensitivity anal-ysis of Rv to the parameters is given. The simulation results show that reducing the number of new infections in the elderly population and increasing the recovery rate of elderly patients with the disease could significantly reduce the transmission of tuberculosis. Furthermore the feasibility of achieving the goals of the WHO End TB Strategy in China is assessed, and we obtain that with existing TB control measures it will take another 30 years for China to reach the WHO goal to reduce 90% of the number of new cases by year 2049. However, in theoretical it is feasible to reach the WHO strategic goal of ending tuberculosis by 2035 if the group contact rate in the elderly population can be reduced though it is difficulty to reduce the contact rate. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: We think this is a very interesting work that gives a good understanding of the current TB transmission in China and assesses the possibility of China achieving the 2035 TB control target and also explores possible ways for how to prevent and control the TB in China

arXiv:2302.04970 [pdf, other]

Efficient Modeling of Surrogates to Improve Multi-source High-dimensional Biobank Studies

Authors: Yue Liu, Molei Liu, Zijian Guo, Tianxi Cai

Abstract: Surrogate variables in electronic health records (EHR) and biobank data play an important role in biomedical studies due to the scarcity or absence of chart-reviewed gold standard labels. We develop a novel approach named SASH for {\bf S}urrogate-{\bf A}ssisted and data-{\bf S}hielding {\bf H}igh-dimensional integrative regression. It is a semi-supervised approach that efficiently leverages sizabl… ▽ More Surrogate variables in electronic health records (EHR) and biobank data play an important role in biomedical studies due to the scarcity or absence of chart-reviewed gold standard labels. We develop a novel approach named SASH for {\bf S}urrogate-{\bf A}ssisted and data-{\bf S}hielding {\bf H}igh-dimensional integrative regression. It is a semi-supervised approach that efficiently leverages sizable unlabeled samples with error-prone EHR surrogate outcomes from multiple local sites, to improve the learning accuracy of the small gold-labeled data. {To facilitate stable and efficient knowledge extraction from the surrogates, our method first obtains a preliminary supervised estimator, and then uses it to assist training a regularized single index model (SIM) for the surrogates. Interestingly, through a chain of convex and properly penalized sparse regressions that approximate the SIM loss with bias-correction, our method avoids the local minima issue of the SIM training, and fully eliminates the impact of the preliminary estimator's large error. In addition, it protects individual-level information through summary-statistics-based data aggregation across the local sites, leveraging a similar idea of bias-corrected approximation for SIM.} Through simulation studies, we demonstrate that our method outperforms existing approaches on finite samples. Finally, we apply our method to develop a high dimensional genetic risk model for type II diabetes using large-scale data sets from UK and Mass General Brigham biobanks, where only a small fraction of subjects in one site has been labeled via chart reviewing. △ Less

Submitted 1 September, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.02162 [pdf, ps, other]

Improve Efficiency of Doubly Robust Estimator when Propensity Score is Misspecified

Authors: Liangbo Lyu, Molei Liu

Abstract: Doubly robust (DR) estimation is a crucial technique in causal inference and missing data problems. We propose a novel Propensity score Augmentved Doubly robust (PAD) estimator to enhance the commonly used DR estimator for average treatment effect on the treated (ATT), or equivalently, the mean of the outcome under covariate shift. Our proposed estimator attains a lower asymptotic variance than th… ▽ More Doubly robust (DR) estimation is a crucial technique in causal inference and missing data problems. We propose a novel Propensity score Augmentved Doubly robust (PAD) estimator to enhance the commonly used DR estimator for average treatment effect on the treated (ATT), or equivalently, the mean of the outcome under covariate shift. Our proposed estimator attains a lower asymptotic variance than the conventional DR estimator when the propensity score (PS) model is misspecified and the outcome regression (OR) model is correct while maintaining the double robustness property that it is valid when either the PS or OR model is correct. These are realized by introducing some properly calibrated adjustment covariates to linearly augment the PS model and solving a restricted weighted least square (RWLS) problem to minimize the variance of the augmented estimator. Both the asymptotic analysis and simulation studies demonstrate that PAD can significantly reduce the estimation variance compared to the standard DR estimator when the PS model is wrong and the OR is correct, and maintain close performance to DR when the PS model is correct. We further applied our method to study the effects of eligibility for 401(k) plan on the improvement of net total financial assets using data from the Survey of Income and Program Participation of 1991. △ Less

Submitted 15 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.12767 [pdf, other]

Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning

Authors: Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang

Abstract: Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end,… ▽ More Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time. △ Less

Submitted 24 December, 2022; originally announced December 2022.

arXiv:2212.12501 [pdf, other]

Learning Optimal Dynamic Treatment Regimens Subject to Stagewise Risk Controls

Authors: Mochuan Liu, Yuanjia Wang, Haoda Fu, Donglin Zeng

Abstract: Dynamic treatment regimens (DTRs) aim at tailoring individualized sequential treatment rules that maximize cumulative beneficial outcomes by accommodating patients' heterogeneity in decision-making. For many chronic diseases including type 2 diabetes mellitus (T2D), treatments are usually multifaceted in the sense that aggressive treatments with a higher expected reward are also likely to elevate… ▽ More Dynamic treatment regimens (DTRs) aim at tailoring individualized sequential treatment rules that maximize cumulative beneficial outcomes by accommodating patients' heterogeneity in decision-making. For many chronic diseases including type 2 diabetes mellitus (T2D), treatments are usually multifaceted in the sense that aggressive treatments with a higher expected reward are also likely to elevate the risk of acute adverse events. In this paper, we propose a new weighted learning framework, namely benefit-risk dynamic treatment regimens (BR-DTRs), to address the benefit-risk trade-off. The new framework relies on a backward learning procedure by restricting the induced risk of the treatment rule to be no larger than a pre-specified risk constraint at each treatment stage. Computationally, the estimated treatment rule solves a weighted support vector machine problem with a modified smooth constraint. Theoretically, we show that the proposed DTRs are Fisher consistent, and we further obtain the convergence rates for both the value and risk functions. Finally, the performance of the proposed method is demonstrated via extensive simulation studies and application to a real study for T2D patients. △ Less

Submitted 22 April, 2024; v1 submitted 23 December, 2022; originally announced December 2022.

arXiv:2212.06954 [pdf, other]

Ease and Equity of Point of Interest Accessibility via Public Transit in the U.S

Authors: Alexander Li, Mengyang Liu, Aurimas Racas, Tejas Santanam, Junaid Syed, Przemyslaw Zientala

Abstract: The tool developed as a result of this paper analyzes the ease and equity of access to major POI categories (e.g. vaccination centers, grocery stores, hospitals) using public transit in major U.S. cities. We built an interactive website that enables easy exploration of current access equity and allows performing scenario analysis by introducing/removing POIs. Accessibility indices were calculated… ▽ More The tool developed as a result of this paper analyzes the ease and equity of access to major POI categories (e.g. vaccination centers, grocery stores, hospitals) using public transit in major U.S. cities. We built an interactive website that enables easy exploration of current access equity and allows performing scenario analysis by introducing/removing POIs. Accessibility indices were calculated using a 2SFCA (2-step floating catchment area) approach, and ML methods were utilized for exploratory statistical analysis of the results. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.05772 [pdf, other]

Multi-Dimensional Self Attention based Approach for Remaining Useful Life Estimation

Authors: Zhi Lai, Mengjuan Liu, Yunzhu Pan, Dajiang Chen

Abstract: Remaining Useful Life (RUL) estimation plays a critical role in Prognostics and Health Management (PHM). Traditional machine health maintenance systems are often costly, requiring sufficient prior expertise, and are difficult to fit into highly complex and changing industrial scenarios. With the widespread deployment of sensors on industrial equipment, building the Industrial Internet of Things (I… ▽ More Remaining Useful Life (RUL) estimation plays a critical role in Prognostics and Health Management (PHM). Traditional machine health maintenance systems are often costly, requiring sufficient prior expertise, and are difficult to fit into highly complex and changing industrial scenarios. With the widespread deployment of sensors on industrial equipment, building the Industrial Internet of Things (IIoT) to interconnect these devices has become an inexorable trend in the development of the digital factory. Using the device's real-time operational data collected by IIoT to get the estimated RUL through the RUL prediction algorithm, the PHM system can develop proactive maintenance measures for the device, thus, reducing maintenance costs and decreasing failure times during operation. This paper carries out research into the remaining useful life prediction model for multi-sensor devices in the IIoT scenario. We investigated the mainstream RUL prediction models and summarized the basic steps of RUL prediction modeling in this scenario. On this basis, a data-driven approach for RUL estimation is proposed in this paper. It employs a Multi-Head Attention Mechanism to fuse the multi-dimensional time-series data output from multiple sensors, in which the attention on features is used to capture the interactions between features and attention on sequences is used to learn the weights of time steps. Then, the Long Short-Term Memory Network is applied to learn the features of time series. We evaluate the proposed model on two benchmark datasets (C-MAPSS and PHM08), and the results demonstrate that it outperforms the state-of-art models. Moreover, through the interpretability of the multi-head attention mechanism, the proposed model can provide a preliminary explanation of engine degradation. Therefore, this approach is promising for predictive maintenance in IIoT scenarios. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2212.05035 [pdf, other]

COVID-19 Activity Risk Calculator as a Gamified Public Health Intervention Tool

Authors: Shreyasvi Natraj, Malhar Bhide, Nathan Yap, Meng Liu, Agrima Seth, Jonathan Berman, Christin Glorioso

Abstract: The Coronavirus disease 2019 (COVID-19) pandemic, caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has impacted over 200 countries leading to hospitalizations and deaths of millions of people. Public health interventions, such as risk estimators, can reduce the spread of pandemics and epidemics through influencing behavior, which impacts risk of exposure and infect… ▽ More The Coronavirus disease 2019 (COVID-19) pandemic, caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has impacted over 200 countries leading to hospitalizations and deaths of millions of people. Public health interventions, such as risk estimators, can reduce the spread of pandemics and epidemics through influencing behavior, which impacts risk of exposure and infection. Current publicly available COVID-19 risk estimation tools have had variable effectiveness during the pandemic due to their dependency on rapidly evolving factors such as community transmission levels and variants. There has also been confusion surrounding certain personal protective strategies such as risk reduction by mask-wearing and vaccination. In order to create a simple easy-to-use tool for estimating different individual risks associated with carrying out daily-life activity, we developed COVID-19 Activity Risk Calculator (CovARC). CovARC is a gamified public health intervention as users can "play with" how different risks associated with COVID-19 can change depending on several different factors when carrying out routine daily activities. Empowering the public to make informed, data-driven decisions about safely engaging in activities may help to reduce COVID-19 levels in the community. In this study, we demonstrate a streamlined, scalable and accurate COVID-19 risk calculation system. Our study also demonstrates the quantitative impact of vaccination and mask-wearing during periods of high case counts. Validation of this impact could inform and support policy decisions regarding case thresholds for mask mandates, and other public health interventions. △ Less

Submitted 24 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2210.12624 [pdf, other]

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach

Authors: Heshan Fernando, Han Shen, Miao Liu, Subhajit Chaudhury, Keerthiram Murugesan, Tianyi Chen

Abstract: Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization… ▽ More Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization framework. However, existing stochastic multi-objective gradient methods and its variants (e.g., MGDA, PCGrad, CAGrad, etc.) all adopt a biased noisy gradient direction, which leads to degraded empirical performance. To this end, we develop a stochastic Multi-objective gradient Correction (MoCo) method for multi-objective optimization. The unique feature of our method is that it can guarantee convergence without increasing the batch size even in the non-convex setting. Simulations on multi-task supervised and reinforcement learning demonstrate the effectiveness of our method relative to state-of-the-art methods. △ Less

Submitted 19 March, 2024; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: Changed hyper-parameter choice which affects some of the convergence rate results in the paper

arXiv:2210.02015 [pdf, other]

Conformalized Fairness via Quantile Regression

Authors: Meichen Liu, Lei Ding, Dengdeng Yu, Wulong Liu, Linglong Kong, Bei Jiang

Abstract: Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity… ▽ More Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity with respect to sensitive attributes, such as race or gender, and thereby derive a reliable fair prediction interval. Using optimal transport and functional synchronization techniques, we establish theoretical guarantees of distribution-free coverage and exact fairness for the induced prediction interval constructed by fair quantiles. A hands-on pipeline is provided to incorporate flexible quantile regressions with an efficient fairness adjustment post-processing algorithm. We demonstrate the superior empirical performance of this approach on several benchmark datasets. Our results show the model's ability to uncover the mechanism underlying the fairness-accuracy trade-off in a wide range of societal and medical applications. △ Less

Submitted 14 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 18 pages, 5 figures, 2 tables

arXiv:2209.06620 [pdf, other]

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

Authors: Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

Abstract: Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a d… ▽ More Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a distributionally robust policy using historical data obtained from the source environment by optimizing against a worst-case perturbation thereof. In particular, we move beyond tabular settings and consider linear function approximation. More specifically, we consider two settings, one where the dataset is well-explored and the other where the dataset has sufficient coverage of the optimal policy. We propose two algorithms~-- one for each of the two settings~-- that achieve error bounds $\tilde{O}(d^{1/2}/N^{1/2})$ and $\tilde{O}(d^{3/2}/N^{1/2})$ respectively, where $d$ is the dimension in the linear function approximation and $N$ is the number of trajectories in the dataset. To the best of our knowledge, they provide the first non-asymptotic results of the sample complexity in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one. △ Less

Submitted 27 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: First two authors contribute equally

arXiv:2209.04977 [pdf, other]

Semi-supervised Triply Robust Inductive Transfer Learning

Authors: Tianxi Cai, Mengyan Li, Molei Liu

Abstract: In this work, we propose a semi-supervised triply robust inductive transfer learning (STRIFLE) approach, which integrates heterogeneous data from label rich source population and label scarce target population to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an i… ▽ More In this work, we propose a semi-supervised triply robust inductive transfer learning (STRIFLE) approach, which integrates heterogeneous data from label rich source population and label scarce target population to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an imputation model, to combine transfer learning and surrogate-assisted semi-supervised learning strategies organically and achieve triple robustness. While the STRIFLE approach requires the target and source populations to share the same conditional distribution of outcome Y given both the surrogate features S and predictors X, it allows the true underlying model of Y|X to differ between the two populations due to the potential covariate shift in S and X. Different from double robustness, even if both nuisance models are misspecified or the distribution of Y|S,X is not the same between the two populations, when the transferred source population and the target population share enough similarities, the triply robust STRIFLE estimator can still partially utilize the source population, and it is guaranteed to be no worse than the target-only surrogate-assisted semi-supervised estimator with negligible errors. These desirable properties of our estimator are established theoretically and verified in finite-sample via extensive simulation studies. We utilize the STRIFLE estimator to train a Type II diabetes polygenic risk prediction model for the African American target population by transferring knowledge from electronic health records linked genomic data observed in a larger European source population. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2209.03902 [pdf]

BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction

Authors: Ai Ni, Mengling Liu, Li-Xuan Qin

Abstract: Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate al… ▽ More Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially confounded outcome. To address these issues, we propose a new method, called BatMan ("BATch MitigAtion via stratificatioN"). It adjusts batches as strata in survival regression and utilize variable selection methods such as LASSO to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a re-sampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data, and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas, and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the naive use of data normalization in the context of develo** survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at https://github.com/LXQin/PRECISION.survival. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.05134 [pdf, other]

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Authors: Doudou Zhou, Molei Liu, Mengyan Li, Tianxi Cai

Abstract: Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite it… ▽ More Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite its importance. We propose a novel $\mathbf{D}$oubly $\mathbf{R}$obust $\mathbf{A}$ugmented $\mathbf{M}$odel $\mathbf{A}$ccuracy $\mathbf{T}$ransfer $\mathbf{I}$nferen$\mathbf{C}$e (DRAMATIC) method for point and interval estimation of commonly used classification performance measures in an unlabeled target population using labeled source data. Specifically, DRAMATIC derives and evaluates the risk model for a binary response $Y$ against some low dimensional predictors $\mathbf{A}$ on the target population, leveraging $Y$ from source data only and high dimensional adjustment features $\mathbf{X}$ from both the source and target data. The proposed estimators are doubly robust in the sense that they are $n^{1/2}$ consistent when at least one model is correctly specified and certain model sparsity assumptions hold. Simulation results demonstrate that the point estimation have negligible bias and the confidence intervals derived by DRAMATIC attain satisfactory empirical coverage levels. We further illustrate the utility of our method to transfer the genetic risk prediction model and its accuracy evaluation for type II diabetes across two patient cohorts in Mass General Brigham (MGB) collected using different sampling mechanisms and at different time points. △ Less

Submitted 8 November, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2207.08204 [pdf, ps, other]

Fast Composite Optimization and Statistical Recovery in Federated Learning

Authors: Yajie Bao, Michael Crawshaw, Shan Luo, Mingrui Liu

Abstract: As a prevalent distributed learning paradigm, Federated Learning (FL) trains a global model on a massive amount of devices with infrequent communication. This paper investigates a class of composite optimization and statistical recovery problems in the FL setting, whose loss function consists of a data-dependent smooth loss and a non-smooth regularizer. Examples include sparse linear regression us… ▽ More As a prevalent distributed learning paradigm, Federated Learning (FL) trains a global model on a massive amount of devices with infrequent communication. This paper investigates a class of composite optimization and statistical recovery problems in the FL setting, whose loss function consists of a data-dependent smooth loss and a non-smooth regularizer. Examples include sparse linear regression using Lasso, low-rank matrix recovery using nuclear norm regularization, etc. In the existing literature, federated composite optimization algorithms are designed only from an optimization perspective without any statistical guarantees. In addition, they do not consider commonly used (restricted) strong convexity in statistical recovery problems. We advance the frontiers of this problem from both optimization and statistical perspectives. From optimization upfront, we propose a new algorithm named \textit{Fast Federated Dual Averaging} for strongly convex and smooth loss and establish state-of-the-art iteration and communication complexity in the composite setting. In particular, we prove that it enjoys a fast rate, linear speedup, and reduced communication rounds. From statistical upfront, for restricted strongly convex and smooth loss, we design another algorithm, namely \textit{Multi-stage Federated Dual Averaging}, and prove a high probability complexity bound with linear speedup up to optimal statistical precision. Experiments in both synthetic and real data demonstrate that our methods perform better than other baselines. To the best of our knowledge, this is the first work providing fast optimization algorithms and statistical recovery guarantees for composite problems in FL. △ Less

Submitted 3 October, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: This is a revised version to fix the imprecise statements about linear speedup from the ICML proceedings. We use another averaging scheme for the returned solutions in Theorem 2.1 and 3.1 to guarantee linear speedup when the number of iterations is large

arXiv:2205.14224 [pdf, other]

Will Bilevel Optimizers Benefit from Loops

Authors: Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

Abstract: Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computatio… ▽ More Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results. △ Less

Submitted 31 May, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 32 pages, 2 figures, 3 tables

arXiv:2205.10732 [pdf, other]

Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee

Authors: Youhui Ye, Meimei Liu, Xin Xing

Abstract: Conformal prediction aims to determine precise levels of confidence in predictions for new objects using past experience. However, the commonly used exchangeable assumptions between the training data and testing data limit its usage in dealing with contaminated testing sets. In this paper, we develop a novel flow-based conformal inference (FCI) method to build predictive sets and infer outliers fo… ▽ More Conformal prediction aims to determine precise levels of confidence in predictions for new objects using past experience. However, the commonly used exchangeable assumptions between the training data and testing data limit its usage in dealing with contaminated testing sets. In this paper, we develop a novel flow-based conformal inference (FCI) method to build predictive sets and infer outliers for complex and high-dimensional data. We leverage ideas from adversarial flow to transfer the input data to a random vector with known distributions. Our roundtrip transformation can map the input data to a low-dimensional space, meanwhile reserving the conditional distribution of input data given each class label, which enables us to construct a non-conformity score for uncertainty quantification. Our approach is applicable and robust when the testing data is contaminated. We evaluate our method, robust flow-based conformal inference, on benchmark datasets. We find that it produces effective predictive sets and accurate outlier detection and is more powerful relative to competing approaches. △ Less

Submitted 15 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.06960 [pdf, other]

Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data

Authors: Xinzhou Guo, Waverly Wei, Molei Liu, Tianxi Cai, Chong Wu, **gshen Wang

Abstract: There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for develo** T2D after taking statins. In th… ▽ More There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for develo** T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for develo** T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure. Our proposed approach delivers asymptotically sharp confidence intervals and debiased estimate for the treatment effect of the most vulnerable subgroup in the presence of high-dimensional covariates. With our proposed approach, we find that females with high T2D genetic risk are at the highest risk of develo** T2D due to statin usage. △ Less

Submitted 21 October, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: 25 pages, 2 figures, 5 tables

arXiv:2205.05040 [pdf, other]

A Communication-Efficient Distributed Gradient Clip** Algorithm for Training Deep Neural Networks

Authors: Mingrui Liu, Zhenxun Zhuang, Yunwei Lei, Chunyang Liao

Abstract: In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clip** is usually employed to address this issue in the single machine se… ▽ More In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clip** is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clip** scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skip** communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works, and design a communication-efficient gradient clip** algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clip** scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{Nε^4}\right)$ iteration complexity and $O(\frac{1}{ε^3})$ communication complexity for finding an $ε$-stationary point in the homogeneous data setting, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup and reduced communication rounds. Our proof relies on novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory. △ Less

Submitted 13 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted by NeurIPS 2022

arXiv:2203.07585 [pdf, ps, other]

Accelerating Stochastic Probabilistic Inference

Authors: Minta Liu, Suliang Bu

Abstract: Recently, Stochastic Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. It optimizes the variational objective with stochastic optimization, following noisy estimates of the natural gradient. However, almost all the state-of-the-art SVI algorithms are based on first-order optimization algorithm and often… ▽ More Recently, Stochastic Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. It optimizes the variational objective with stochastic optimization, following noisy estimates of the natural gradient. However, almost all the state-of-the-art SVI algorithms are based on first-order optimization algorithm and often suffer from poor convergence rate. In this paper, we bridge the gap between second-order methods and stochastic variational inference by proposing a second-order based stochastic variational inference approach. In particular, firstly we derive the Hessian matrix of the variational objective. Then we devise two numerical schemes to implement second-order SVI efficiently. Thorough empirical evaluations are investigated on both synthetic and real dataset to backup both the effectiveness and efficiency of the proposed approach. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.06496 [pdf, other]

Maxway CRT: Improving the Robustness of the Model-X Inference

Authors: Shuangning Li, Molei Liu

Abstract: The model-X conditional randomization test (CRT) is a flexible and powerful testing procedure for the conditional independence hypothesis: X is independent of Y conditioning on Z. Though having many attractive properties, the model-X CRT relies on the model-X assumption that we have perfect knowledge of the distribution of X | Z. If there is an error in modeling the distribution of X | Z, this app… ▽ More The model-X conditional randomization test (CRT) is a flexible and powerful testing procedure for the conditional independence hypothesis: X is independent of Y conditioning on Z. Though having many attractive properties, the model-X CRT relies on the model-X assumption that we have perfect knowledge of the distribution of X | Z. If there is an error in modeling the distribution of X | Z, this approach may lose its validity. This problem is even more severe when the adjustment covariates Z are of high dimensionality, in which situation precise modeling of X against Z can be hard. In response to this, we propose the Maxway (Model and Adjust X With the Assistance of Y) CRT, which learns the distribution of Y | Z, and uses it to calibrate the resampling distribution of X to gain robustness to the error in modeling X. We prove that the type-I error inflation of the Maxway CRT can be controlled by the learning error for the low-dimensional adjusting model plus the product of learning errors for X | Z and Y | Z, which could be interpreted as an "almost doubly robust" property. Based on this, we develop implementing algorithms of the Maxway CRT in practical scenarios including (surrogate-assisted) semi-supervised learning and transfer learning where valid information about Y | Z can be potentially provided by some auxiliary or external data. Through extensive simulation studies under different scenarios, we demonstrate that the Maxway CRT achieves significantly better type-I error control than existing model-X inference approaches while preserving similar powers. Finally, we apply our methodology to two real examples, including (1) studying obesity paradox with electronic health record (EHR) data assisted by surrogate variables; (2) inferring the side effect of statins among the ethnic minority group via transferring knowledge from the majority group. △ Less

Submitted 1 May, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

arXiv:2202.12472 [pdf, ps, other]

Bidding Agent Design in the LinkedIn Ad Marketplace

Authors: Yuan Gao, Kaiyu Yang, Yuanlong Chen, Min Liu, Noureddine El Karoui

Abstract: We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy d… ▽ More We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy derived from this framework automatically guarantees the optimality of budget allocation across ad units and platforms. Common constraints such as budget delivery schedule, return on investments and guaranteed results, directly translates to additional parameters in the bidding formula. We share practical learnings of the deployed bidding system in the LinkedIn ad marketplace based on this framework. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2201.00459 [pdf, ps, other]

A sampling scheme for estimating the prevalence of a pandemic

Authors: Ze Liu, Siyu Yi, Jianghu, Dong, Min-Qian Liu, Yongdao Zhou

Abstract: The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper,… ▽ More The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper, which utilises the information about the distributions of both population and diagnosed cases, to investigate the prevalence more efficiently. The global likelihood sampling, a robust and efficient sampler to draw samples from any probability density function, is used in our sampling strategy, and thus, our new method can automatically adapt to the complicated distributions of population and cases. Moreover, the corresponding estimating method is simple, which facilitates the practical implementation. Some recommendations for practical implementation are given. Finally, several simulations and a practical example verified its efficiency. △ Less

Submitted 2 January, 2022; originally announced January 2022.

arXiv:2111.14486 [pdf, other]

Just Least Squares: Binary Compressive Sampling with Low Generative Intrinsic Dimension

Authors: Yuling Jiao, Dingwei Li, Min Liu, Xiangliang Lu, Yuanyuan Yang

Abstract: In this paper, we consider recovering $n$ dimensional signals from $m$ binary measurements corrupted by noises and sign flips under the assumption that the target signals have low generative intrinsic dimension, i.e., the target signals can be approximately generated via an $L$-Lipschitz generator $G: \mathbb{R}^k\rightarrow\mathbb{R}^{n}, k\ll n$. Although the binary measurements model is highly… ▽ More In this paper, we consider recovering $n$ dimensional signals from $m$ binary measurements corrupted by noises and sign flips under the assumption that the target signals have low generative intrinsic dimension, i.e., the target signals can be approximately generated via an $L$-Lipschitz generator $G: \mathbb{R}^k\rightarrow\mathbb{R}^{n}, k\ll n$. Although the binary measurements model is highly nonlinear, we propose a least square decoder and prove that, up to a constant $c$, with high probability, the least square decoder achieves a sharp estimation error $\mathcal{O} (\sqrt{\frac{k\log (Ln)}{m}})$ as long as $m\geq \mathcal{O}( k\log (Ln))$. Extensive numerical simulations and comparisons with state-of-the-art methods demonstrated the least square decoder is robust to noise and sign flips, as indicated by our theory. By constructing a ReLU network with properly chosen depth and width, we verify the (approximately) deep generative prior, which is of independent interest. △ Less

Submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.12526 [pdf]

Mining Meta-indicators of University Ranking: A Machine Learning Approach Based on SHAP

Authors: Shudong Yang, Miaomiao Liu

Abstract: University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the meta-indicators of the index system by simplifying the complexity? This research discovered three meta-indicators based on interpretable machine learning. The first one is time, to be friends with ti… ▽ More University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the meta-indicators of the index system by simplifying the complexity? This research discovered three meta-indicators based on interpretable machine learning. The first one is time, to be friends with time, and believe in the power of time, and accumulate historical deposits; the second one is space, to be friends with city, and grow together by co-develop; the third one is relationships, to be friends with alumni, and strive for more alumni donations without ceiling. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 4 pages, 1 figure

ACM Class: J.4

arXiv:2111.11801 [pdf, other]

A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems

Authors: Peili Li, Min Liu, Zhou Yu

Abstract: By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally c… ▽ More By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method, which is equivalent to the semi-smooth Newton (SSN) method, to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2111.02390 [pdf]

doi 10.1080/19466315.2022.2046150

Incorporating Surrogate Information for Adaptive Subgroup Enrichment Design with Sample Size Re-estimation

Authors: Liwen Wu, Qing Li, Mengya Liu, Jianchang Lin

Abstract: Adaptive subgroup enrichment design is an efficient design framework that allows accelerated development for investigational treatments while also having flexibility in population selection within the course of the trial. The adaptive decision at the interim analysis is commonly made based on the conditional probability of trial success. However, one of the critical challenges for such adaptive de… ▽ More Adaptive subgroup enrichment design is an efficient design framework that allows accelerated development for investigational treatments while also having flexibility in population selection within the course of the trial. The adaptive decision at the interim analysis is commonly made based on the conditional probability of trial success. However, one of the critical challenges for such adaptive designs is immature data for interim decisions, particularly in the targeted subgroup with a limited sample size at the first stage of the trial. In this paper, we improve the interim decision making by incorporating information from surrogate endpoints when estimating conditional power at the interim analysis, by predicting the primary treatment effect based on the observed surrogate endpoint and prior knowledge or historical data about the relationship between endpoints. Modified conditional power is developed for both selecting the patient population to be enrolled after the interim analysis and sample size re-estimation. In the simulation study, our proposed design shows a higher chance to make desirable interim decisions and achieves higher overall power, while controlling the overall type I error. This performance is robust over drift of prior knowledge from the true relationship between two endpoints. We also demonstrate the application of our proposed design in two case studies in oncology and vaccine trials. △ Less

Submitted 3 November, 2021; originally announced November 2021.

arXiv:2110.03032 [pdf, other]

Learning Multi-Objective Curricula for Robotic Policy Learning

Authors: Jikun Kang, Miao Liu, Abhinav Gupta, Chris Pal, Xue Liu, Jie Fu

Abstract: Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment… ▽ More Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of the aforementioned predefined paradigms. It is unclear which of these paradigms are complementary, and how the combination of them can be learned from interactions with the environment. Therefore, in this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula that are generated by a set of parametric curriculum modules. Each curriculum module is instantiated as a neural network and is responsible for generating a particular curriculum. In order to coordinate those potentially conflicting modules in unified parameter space, we propose a multi-task hyper-net learning framework that uses a single hyper-net to parameterize all those curriculum modules. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum, which may otherwise be difficult to design manually. We evaluate our method on a series of robotic manipulation tasks and demonstrate its superiority over other state-of-the-art ACL methods in terms of sample efficiency and final performance. △ Less

Submitted 19 October, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: CoRL 2022; Reinforcement Learning; Meta-Reinforcement Learning; Hyper-network

arXiv:2109.08850 [pdf, other]

Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly

Authors: Yuling Jiao, Dingwei Li, Min Liu, Xiliang Lu

Abstract: Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scala… ▽ More Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scalability. In this work, we prove the linear convergence rate to CD for solving MCP/SCAD penalized least squares problems. △ Less

Submitted 18 September, 2021; originally announced September 2021.

arXiv:2108.05990 [pdf, other]

Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization

Authors: Shujie Ma, Mingming Liu

Abstract: We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Sobolev space with mixed derivatives. Functions in this space only need to sa… ▽ More We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Sobolev space with mixed derivatives. Functions in this space only need to satisfy a smoothness condition rather than having a compositional structure. We develop non-asymptotic excess risk bounds for our SDRN estimator. We further derive that the SDRN estimator can achieve the same minimax rate of estimation (up to logarithmic factors) as one-dimensional nonparametric regression when the dimension of the features is fixed, and the estimator has a suboptimal rate when the dimension grows with the sample size. We show that the depth and the total number of nodes and weights of the ReLU network need to grow as the sample size increases to ensure a good performance, and also investigate how fast they should increase with the sample size. These results provide an important theoretical guidance and basis for empirical studies by deep neural networks. △ Less

Submitted 9 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

arXiv:2107.01876 [pdf, other]

Which Invariance Should We Transfer? A Causal Minimax Learning Approach

Authors: Mingzhou Liu, Xiangyu Zheng, Xinwei Sun, Fang Fang, Yizhou Wang

Abstract: A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable pred… ▽ More A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable predictors are more effective in identifying stable information. However, a key question remains: which subset of this whole stable information should the model transfer, in order to achieve optimal generalization ability? To answer this question, we present a comprehensive minimax analysis from a causal perspective. Specifically, we first provide a graphical condition for the whole stable set to be optimal. When this condition fails, we surprisingly find with an example that this whole stable set, although can fully exploit stable information, is not the optimal one to transfer. To identify the optimal subset under this case, we propose to estimate the worst-case risk with a novel optimization scheme over the intervention functions on mutable causal mechanisms. We then propose an efficient algorithm to search for the subset with minimal worst-case risk, based on a newly defined equivalence relation between stable subsets. Compared to the exponential cost of exhaustively searching over all subsets, our searching strategy enjoys a polynomial complexity. The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease. △ Less

Submitted 30 May, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: Accepted version of ICML-23

Showing 1–50 of 142 results for author: Liu, M