Skip to main content

Showing 1–50 of 142 results for author: Liu, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.17002  [pdf, other

    eess.SP cs.LG cs.NE stat.AP

    Benchmarking mortality risk prediction from electrocardiograms

    Authors: Platon Lukyanenko, Joshua Mayourian, Mingxuan Liu, John K. Triedman, Sunil J. Ghelani, William G. La Cava

    Abstract: Several recent high-impact studies leverage large hospital-owned electrocardiographic (ECG) databases to model and predict patient mortality. MIMIC-IV, released September 2023, is the first comparable public dataset and includes 800,000 ECGs from a U.S. hospital system. Previously, the largest public ECG dataset was Code-15, containing 345,000 ECGs collected during routine care in Brazil. These da… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 9 pages plus appendix, 2 figures

  2. arXiv:2406.10917  [pdf, other

    cs.LG stat.ML

    Bayesian Intervention Optimization for Causal Discovery

    Authors: Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou Wang

    Abstract: Causal discovery is crucial for understanding complex systems and informing decisions. While observational data can uncover causal relationships under certain assumptions, it often falls short, making active interventions necessary. Current methods, such as Bayesian and graph-theoretical approaches, do not prioritize decision-making and often rely on ideal conditions or information gain, which is… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2405.19231  [pdf, other

    stat.ME

    Covariate Shift Corrected Conditional Randomization Test

    Authors: Bowen Xu, Yiwen Huang, Chuan Hong, Shuangning Li, Molei Liu

    Abstract: Conditional independence tests are crucial across various disciplines in determining the independence of an outcome variable $Y$ from a treatment variable $X$, conditioning on a set of confounders $Z$. The Conditional Randomization Test (CRT) offers a powerful framework for such testing by assuming known distributions of $X \mid Z$; it controls the Type-I error exactly, allowing for the use of fle… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2405.18722  [pdf, other

    stat.ME

    Adaptive and Efficient Learning with Blockwise Missing and Semi-Supervised Data

    Authors: Yiming Li, Xuehan Yang, Ying Wei, Molei Liu

    Abstract: Data fusion is an important way to realize powerful and generalizable analyses across multiple sources. However, different capability of data collection across the sources has become a prominent issue in practice. This could result in the blockwise missingness (BM) of covariates troublesome for integration. Meanwhile, the high cost of obtaining gold-standard labels can cause the missingness of res… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2405.15920  [pdf, other

    cs.LG stat.ML

    SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

    Authors: Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang

    Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward map**: the former characterizes the transition dynamics, and the latter characterizes the task-specif… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.16173

  6. arXiv:2405.02881  [pdf, other

    cs.LG cs.AI stat.ML

    FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients

    Authors: Zhuohua Li, Maoli Liu, John C. S. Lui

    Abstract: Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based… ▽ More

    Submitted 20 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

  7. arXiv:2404.01191  [pdf, other

    stat.ME

    A Semiparametric Approach for Robust and Efficient Learning with Biobank Data

    Authors: Molei Liu, Xinyi Wang, Chuan Hong

    Abstract: With the increasing availability of electronic health records (EHR) linked with biobank data for translational research, a critical step in realizing its potential is to accurately classify phenotypes for patients. Existing approaches to achieve this goal are based on error-prone EHR surrogate outcomes, assisted and validated by a small set of labels obtained via medical chart review, which may al… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2403.05281  [pdf, other

    stat.ML math.ST

    An Efficient Quasi-Random Sampling for Copulas

    Authors: Sumin Wang, Chenxian Huang, Yongdao Zhou, Min-Qian Liu

    Abstract: This paper examines an efficient method for quasi-random sampling of copulas in Monte Carlo computations. Traditional methods, like conditional distribution methods (CDM), have limitations when dealing with high-dimensional or implicit copulas, which refer to those that cannot be accurately represented by existing parametric copulas. Instead, this paper proposes the use of generative models, such… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 42 pages, 5 figures

  9. arXiv:2401.04900  [pdf, other

    astro-ph.SR astro-ph.IM cs.LG stat.ML

    SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation

    Authors: Mengmeng Zhang, Fan Wu, Yude Bu, Shanshan Li, Zhen** Yi, Meng Liu, Xiaoming Kong

    Abstract: The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlap** isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by A&A

  10. arXiv:2311.15031  [pdf, other

    stat.ME

    Robust and Efficient Semi-supervised Learning for Ising Model

    Authors: Daiqing Wu, Molei Liu

    Abstract: In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic healt… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  11. arXiv:2310.11724  [pdf, other

    stat.ME math.ST

    Simultaneous Nonparametric Inference of M-regression under Complex Temporal Dynamics

    Authors: Miaoshiqi Liu, Zhou Zhou

    Abstract: The paper considers simultaneous nonparametric inference for a wide class of M-regression models with time-varying coefficients. The covariates and errors of the regression model are tackled as a general class of nonstationary time series and are allowed to be cross-dependent. We construct $\sqrt{n}$-consistent inference for the cumulative regression function, whose limiting properties are disclos… ▽ More

    Submitted 26 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

  12. arXiv:2309.17283  [pdf, other

    stat.ME stat.ML

    The Blessings of Multiple Treatments and Outcomes in Treatment Effect Estimation

    Authors: Yong Wu, Mingzhou Liu, **g Yan, Yanwei Fu, Shouyan Wang, Yizhou Wang, Xinwei Sun

    Abstract: Assessing causal effects in the presence of unobserved confounding is a challenging problem. Existing studies leveraged proxy variables or multiple treatments to adjust for the confounding bias. In particular, the latter approach attributes the impact on a single outcome to multiple treatments, allowing estimating latent variables for confounding control. Nevertheless, these methods primarily focu… ▽ More

    Submitted 14 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Preprint, under review

  13. arXiv:2309.08923  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs

    Authors: Liuqing Yang, Yongdao Zhou, Haoda Fu, Min-Qian Liu, Wei Zheng

    Abstract: Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learn… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  14. arXiv:2307.01389  [pdf, other

    cs.LG stat.ME

    Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

    Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  15. arXiv:2305.19802  [pdf, other

    stat.ML cs.LG

    Neuro-Causal Factor Analysis

    Authors: Alex Markham, Mingyu Liu, Bryon Aragam, Liam Solus

    Abstract: Factor analysis (FA) is a statistical tool for studying how observed variables with some mutual dependences can be expressed as functions of mutually independent unobserved factors, and it is widely applied throughout the psychological, biological, and physical sciences. We revisit this classic method from the comparatively new perspective given by advancements in causal discovery and deep learnin… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 23 pages, 13 figures

  16. arXiv:2305.15759  [pdf, other

    stat.ML cs.CR cs.LG

    Differentially Private Latent Diffusion Models

    Authors: Saiyue Lyu, Michael F. Liu, Margarita Vinaroz, Mijung Park

    Abstract: Diffusion models (DMs) are widely used for generating high-quality high-dimensional images in a non-differentially private manner. To address this challenge, recent papers suggest pre-training DMs with public data, then fine-tuning them with private data using DP-SGD for a relatively short period. In this paper, we further improve the current state of DMs with DP by adopting the Latent Diffusion M… ▽ More

    Submitted 15 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  17. arXiv:2305.06584  [pdf, other

    cs.LG math.OC stat.ML

    Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

    Authors: Mo Liu, Paul Grigas, Heyuan Liu, Zuo-Jun Max Shen

    Abstract: We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the de… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  18. arXiv:2305.05281  [pdf, other

    stat.ME cs.LG

    Causal Discovery via Conditional Independence Testing with Proxy Variables

    Authors: Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

    Abstract: Distinguishing causal connections from correlations is important in many scenarios. However, the presence of unobserved variables, such as the latent confounder, can introduce bias in conditional independence testing commonly employed in constraint-based causal discovery for identifying causal relations. To address this issue, existing methods introduced proxy variables to adjust for the bias caus… ▽ More

    Submitted 1 May, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: ICML 2024

  19. arXiv:2305.05276  [pdf, other

    cs.LG stat.ME

    Causal Discovery from Subsampled Time Series with Proxy Variables

    Authors: Mingzhou Liu, Xinwei Sun, Ling**g Hu, Yizhou Wang

    Abstract: Inferring causal structures from time series data is the central interest of many scientific inquiries. A major barrier to such inference is the problem of subsampling, i.e., the frequency of measurement is much lower than that of causal influence. To overcome this problem, numerous methods have been proposed, yet either was limited to the linear case or failed to achieve identifiability. In this… ▽ More

    Submitted 24 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  20. arXiv:2303.17791  [pdf

    stat.AP

    Analysis of the current status of tuberculosis transmission in China based on a heterogeneity model

    Authors: Chuanqing Xu, Kedeng Cheng, Yu Wang, Songbai Guo, Maoxing Liu, Xiao**g Wang, Zhiguo Zhang

    Abstract: Tuberculosis (TB) is an infectious disease transmitted through the respiratory system. China is one of the countries with a high burden of TB. Since 2004, an average of more than 800,000 cases of active TB have been reported each year in China. Analyzing the case data from 2004-2018, we find significant differences in TB incidence by age group. Therefore, the effect of age heterogeneous structure… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: We think this is a very interesting work that gives a good understanding of the current TB transmission in China and assesses the possibility of China achieving the 2035 TB control target and also explores possible ways for how to prevent and control the TB in China

  21. arXiv:2302.04970  [pdf, other

    stat.ME

    Efficient Modeling of Surrogates to Improve Multi-source High-dimensional Biobank Studies

    Authors: Yue Liu, Molei Liu, Zijian Guo, Tianxi Cai

    Abstract: Surrogate variables in electronic health records (EHR) and biobank data play an important role in biomedical studies due to the scarcity or absence of chart-reviewed gold standard labels. We develop a novel approach named SASH for {\bf S}urrogate-{\bf A}ssisted and data-{\bf S}hielding {\bf H}igh-dimensional integrative regression. It is a semi-supervised approach that efficiently leverages sizabl… ▽ More

    Submitted 1 September, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  22. arXiv:2301.02162  [pdf, ps, other

    stat.ME

    Improve Efficiency of Doubly Robust Estimator when Propensity Score is Misspecified

    Authors: Liangbo Lyu, Molei Liu

    Abstract: Doubly robust (DR) estimation is a crucial technique in causal inference and missing data problems. We propose a novel Propensity score Augmentved Doubly robust (PAD) estimator to enhance the commonly used DR estimator for average treatment effect on the treated (ATT), or equivalently, the mean of the outcome under covariate shift. Our proposed estimator attains a lower asymptotic variance than th… ▽ More

    Submitted 15 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  23. arXiv:2212.12767  [pdf, other

    stat.ML cs.LG

    Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning

    Authors: Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang

    Abstract: Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end,… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  24. arXiv:2212.12501  [pdf, other

    stat.ME

    Learning Optimal Dynamic Treatment Regimens Subject to Stagewise Risk Controls

    Authors: Mochuan Liu, Yuanjia Wang, Haoda Fu, Donglin Zeng

    Abstract: Dynamic treatment regimens (DTRs) aim at tailoring individualized sequential treatment rules that maximize cumulative beneficial outcomes by accommodating patients' heterogeneity in decision-making. For many chronic diseases including type 2 diabetes mellitus (T2D), treatments are usually multifaceted in the sense that aggressive treatments with a higher expected reward are also likely to elevate… ▽ More

    Submitted 22 April, 2024; v1 submitted 23 December, 2022; originally announced December 2022.

  25. arXiv:2212.06954  [pdf, other

    stat.AP

    Ease and Equity of Point of Interest Accessibility via Public Transit in the U.S

    Authors: Alexander Li, Mengyang Liu, Aurimas Racas, Tejas Santanam, Junaid Syed, Przemyslaw Zientala

    Abstract: The tool developed as a result of this paper analyzes the ease and equity of access to major POI categories (e.g. vaccination centers, grocery stores, hospitals) using public transit in major U.S. cities. We built an interactive website that enables easy exploration of current access equity and allows performing scenario analysis by introducing/removing POIs. Accessibility indices were calculated… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  26. arXiv:2212.05772  [pdf, other

    cs.LG stat.ML

    Multi-Dimensional Self Attention based Approach for Remaining Useful Life Estimation

    Authors: Zhi Lai, Mengjuan Liu, Yunzhu Pan, Dajiang Chen

    Abstract: Remaining Useful Life (RUL) estimation plays a critical role in Prognostics and Health Management (PHM). Traditional machine health maintenance systems are often costly, requiring sufficient prior expertise, and are difficult to fit into highly complex and changing industrial scenarios. With the widespread deployment of sensors on industrial equipment, building the Industrial Internet of Things (I… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  27. arXiv:2212.05035  [pdf, other

    cs.CY cs.IR math.NA stat.ME

    COVID-19 Activity Risk Calculator as a Gamified Public Health Intervention Tool

    Authors: Shreyasvi Natraj, Malhar Bhide, Nathan Yap, Meng Liu, Agrima Seth, Jonathan Berman, Christin Glorioso

    Abstract: The Coronavirus disease 2019 (COVID-19) pandemic, caused by the virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has impacted over 200 countries leading to hospitalizations and deaths of millions of people. Public health interventions, such as risk estimators, can reduce the spread of pandemics and epidemics through influencing behavior, which impacts risk of exposure and infect… ▽ More

    Submitted 24 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  28. arXiv:2210.12624  [pdf, other

    cs.LG math.OC stat.ML

    Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach

    Authors: Heshan Fernando, Han Shen, Miao Liu, Subhajit Chaudhury, Keerthiram Murugesan, Tianyi Chen

    Abstract: Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization… ▽ More

    Submitted 19 March, 2024; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: Changed hyper-parameter choice which affects some of the convergence rate results in the paper

  29. arXiv:2210.02015  [pdf, other

    stat.ML cs.CY cs.LG

    Conformalized Fairness via Quantile Regression

    Authors: Meichen Liu, Lei Ding, Dengdeng Yu, Wulong Liu, Linglong Kong, Bei Jiang

    Abstract: Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity… ▽ More

    Submitted 14 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: 18 pages, 5 figures, 2 tables

  30. arXiv:2209.06620  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

    Authors: Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou

    Abstract: Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a d… ▽ More

    Submitted 27 January, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: First two authors contribute equally

  31. arXiv:2209.04977  [pdf, other

    stat.ME

    Semi-supervised Triply Robust Inductive Transfer Learning

    Authors: Tianxi Cai, Mengyan Li, Molei Liu

    Abstract: In this work, we propose a semi-supervised triply robust inductive transfer learning (STRIFLE) approach, which integrates heterogeneous data from label rich source population and label scarce target population to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an i… ▽ More

    Submitted 11 September, 2022; originally announced September 2022.

  32. arXiv:2209.03902  [pdf

    stat.ME

    BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction

    Authors: Ai Ni, Mengling Liu, Li-Xuan Qin

    Abstract: Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate al… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  33. arXiv:2208.05134  [pdf, other

    stat.ME

    Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

    Authors: Doudou Zhou, Molei Liu, Mengyan Li, Tianxi Cai

    Abstract: Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite it… ▽ More

    Submitted 8 November, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

  34. arXiv:2207.08204  [pdf, ps, other

    cs.LG stat.ML

    Fast Composite Optimization and Statistical Recovery in Federated Learning

    Authors: Yajie Bao, Michael Crawshaw, Shan Luo, Mingrui Liu

    Abstract: As a prevalent distributed learning paradigm, Federated Learning (FL) trains a global model on a massive amount of devices with infrequent communication. This paper investigates a class of composite optimization and statistical recovery problems in the FL setting, whose loss function consists of a data-dependent smooth loss and a non-smooth regularizer. Examples include sparse linear regression us… ▽ More

    Submitted 3 October, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: This is a revised version to fix the imprecise statements about linear speedup from the ICML proceedings. We use another averaging scheme for the returned solutions in Theorem 2.1 and 3.1 to guarantee linear speedup when the number of iterations is large

  35. arXiv:2205.14224  [pdf, other

    cs.LG math.OC stat.ML

    Will Bilevel Optimizers Benefit from Loops

    Authors: Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

    Abstract: Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computatio… ▽ More

    Submitted 31 May, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 32 pages, 2 figures, 3 tables

  36. arXiv:2205.10732  [pdf, other

    stat.ML cs.LG

    Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee

    Authors: Youhui Ye, Meimei Liu, Xin Xing

    Abstract: Conformal prediction aims to determine precise levels of confidence in predictions for new objects using past experience. However, the commonly used exchangeable assumptions between the training data and testing data limit its usage in dealing with contaminated testing sets. In this paper, we develop a novel flow-based conformal inference (FCI) method to build predictive sets and infer outliers fo… ▽ More

    Submitted 15 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  37. arXiv:2205.06960  [pdf, other

    stat.AP stat.ME

    Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data

    Authors: Xinzhou Guo, Waverly Wei, Molei Liu, Tianxi Cai, Chong Wu, **gshen Wang

    Abstract: There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for develo** T2D after taking statins. In th… ▽ More

    Submitted 21 October, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 25 pages, 2 figures, 5 tables

  38. arXiv:2205.05040  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    A Communication-Efficient Distributed Gradient Clip** Algorithm for Training Deep Neural Networks

    Authors: Mingrui Liu, Zhenxun Zhuang, Yunwei Lei, Chunyang Liao

    Abstract: In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clip** is usually employed to address this issue in the single machine se… ▽ More

    Submitted 13 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted by NeurIPS 2022

  39. arXiv:2203.07585  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Accelerating Stochastic Probabilistic Inference

    Authors: Minta Liu, Suliang Bu

    Abstract: Recently, Stochastic Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models. It optimizes the variational objective with stochastic optimization, following noisy estimates of the natural gradient. However, almost all the state-of-the-art SVI algorithms are based on first-order optimization algorithm and often… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  40. arXiv:2203.06496  [pdf, other

    stat.ME

    Maxway CRT: Improving the Robustness of the Model-X Inference

    Authors: Shuangning Li, Molei Liu

    Abstract: The model-X conditional randomization test (CRT) is a flexible and powerful testing procedure for the conditional independence hypothesis: X is independent of Y conditioning on Z. Though having many attractive properties, the model-X CRT relies on the model-X assumption that we have perfect knowledge of the distribution of X | Z. If there is an error in modeling the distribution of X | Z, this app… ▽ More

    Submitted 1 May, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

  41. arXiv:2202.12472  [pdf, ps, other

    cs.GT cs.AI cs.IR cs.LG stat.ML

    Bidding Agent Design in the LinkedIn Ad Marketplace

    Authors: Yuan Gao, Kaiyu Yang, Yuanlong Chen, Min Liu, Noureddine El Karoui

    Abstract: We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy d… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  42. arXiv:2201.00459  [pdf, ps, other

    stat.AP

    A sampling scheme for estimating the prevalence of a pandemic

    Authors: Ze Liu, Siyu Yi, Jianghu, Dong, Min-Qian Liu, Yongdao Zhou

    Abstract: The spread of COVID-19 makes it essential to investigate its prevalence. In such investigation research, as far as we know, the widely-used sampling methods didn't use the information sufficiently about the numbers of the previously diagnosed cases, which provides a priori information about the true numbers of infections. This motivates us to develop a new, two-stage sampling method in this paper,… ▽ More

    Submitted 2 January, 2022; originally announced January 2022.

  43. arXiv:2111.14486  [pdf, other

    cs.LG eess.SP stat.ML

    Just Least Squares: Binary Compressive Sampling with Low Generative Intrinsic Dimension

    Authors: Yuling Jiao, Dingwei Li, Min Liu, Xiangliang Lu, Yuanyuan Yang

    Abstract: In this paper, we consider recovering $n$ dimensional signals from $m$ binary measurements corrupted by noises and sign flips under the assumption that the target signals have low generative intrinsic dimension, i.e., the target signals can be approximately generated via an $L$-Lipschitz generator $G: \mathbb{R}^k\rightarrow\mathbb{R}^{n}, k\ll n$. Although the binary measurements model is highly… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

  44. arXiv:2111.12526  [pdf

    stat.AP cs.LG stat.ML

    Mining Meta-indicators of University Ranking: A Machine Learning Approach Based on SHAP

    Authors: Shudong Yang, Miaomiao Liu

    Abstract: University evaluation and ranking is an extremely complex activity. Major universities are struggling because of increasingly complex indicator systems of world university rankings. So can we find the meta-indicators of the index system by simplifying the complexity? This research discovered three meta-indicators based on interpretable machine learning. The first one is time, to be friends with ti… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: 4 pages, 1 figure

    ACM Class: J.4

  45. arXiv:2111.11801  [pdf, other

    stat.CO

    A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems

    Authors: Peili Li, Min Liu, Zhou Yu

    Abstract: By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally c… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  46. Incorporating Surrogate Information for Adaptive Subgroup Enrichment Design with Sample Size Re-estimation

    Authors: Liwen Wu, Qing Li, Mengya Liu, Jianchang Lin

    Abstract: Adaptive subgroup enrichment design is an efficient design framework that allows accelerated development for investigational treatments while also having flexibility in population selection within the course of the trial. The adaptive decision at the interim analysis is commonly made based on the conditional probability of trial success. However, one of the critical challenges for such adaptive de… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  47. arXiv:2110.03032  [pdf, other

    cs.LG cs.AI cs.RO eess.SY stat.ML

    Learning Multi-Objective Curricula for Robotic Policy Learning

    Authors: Jikun Kang, Miao Liu, Abhinav Gupta, Chris Pal, Xue Liu, Jie Fu

    Abstract: Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward sha**, environment… ▽ More

    Submitted 19 October, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: CoRL 2022; Reinforcement Learning; Meta-Reinforcement Learning; Hyper-network

  48. arXiv:2109.08850  [pdf, other

    stat.ML cs.LG stat.CO

    Coordinate Descent for MCP/SCAD Penalized Least Squares Converges Linearly

    Authors: Yuling Jiao, Dingwei Li, Min Liu, Xiliang Lu

    Abstract: Recovering sparse signals from observed data is an important topic in signal/imaging processing, statistics and machine learning. Nonconvex penalized least squares have been attracted a lot of attentions since they enjoy nice statistical properties. Computationally, coordinate descent (CD) is a workhorse for minimizing the nonconvex penalized least squares criterion due to its simplicity and scala… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

  49. arXiv:2108.05990  [pdf, other

    stat.ME

    Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization

    Authors: Shujie Ma, Mingming Liu

    Abstract: We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Sobolev space with mixed derivatives. Functions in this space only need to sa… ▽ More

    Submitted 9 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

  50. arXiv:2107.01876  [pdf, other

    stat.ML cs.LG

    Which Invariance Should We Transfer? A Causal Minimax Learning Approach

    Authors: Mingzhou Liu, Xiangyu Zheng, Xinwei Sun, Fang Fang, Yizhou Wang

    Abstract: A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable pred… ▽ More

    Submitted 30 May, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted version of ICML-23