Skip to main content

Showing 1–50 of 147 results for author: Li, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.13036  [pdf, other

    stat.ML cs.LG math.PR math.ST stat.CO

    Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

    Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

    Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.02834  [pdf, ps, other

    stat.ME

    Asymptotic inference with flexible covariate adjustment under rerandomization and stratified rerandomization

    Authors: Bingkai Wang, Fan Li

    Abstract: Rerandomization is an effective treatment allocation procedure to control for baseline covariate imbalance. For estimating the average treatment effect, rerandomization has been previously shown to improve the precision of the unadjusted and the linearly-adjusted estimators over simple randomization without compromising consistency. However, it remains unclear whether such results apply more gener… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2406.02028  [pdf

    stat.ME

    How should parallel cluster randomized trials with a baseline period be analyzed? A survey of estimands and common estimators

    Authors: Kenneth Menglin Lee, Fan Li

    Abstract: The parallel cluster randomized trial with baseline (PB-CRT) is a common variant of the standard parallel cluster randomized trial (P-CRT) that maintains parallel randomization but additionally allows for both within and between-cluster comparisons. We define two estimands of interest in the context of PB-CRTs, the participant-average treatment effect (pATE) and cluster-average treatment effect (c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 77 pages, 16 figures

  4. arXiv:2404.18256  [pdf, other

    stat.ME

    Semiparametric causal mediation analysis in cluster-randomized experiments

    Authors: Chao Cheng, Fan Li

    Abstract: In cluster-randomized experiments, there is emerging interest in exploring the causal mechanism in which a cluster-level treatment affects the outcome through an intermediate outcome. Despite an extensive development of causal mediation methods in the past decade, only a few exceptions have been considered in assessing causal mediation in cluster-randomized studies, all of which depend on parametr… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  5. arXiv:2404.14840  [pdf, other

    stat.ME

    Analysis of cohort stepped wedge cluster-randomized trials with non-ignorable dropout via joint modeling

    Authors: Alessandro Gasparini, Michael J. Crowther, Emiel O. Hoogendijk, Fan Li, Michael O. Harhay

    Abstract: Stepped wedge cluster-randomized trial (CRTs) designs randomize clusters of individuals to intervention sequences, ensuring that every cluster eventually transitions from a control period to receive the intervention under study by the end of the study period. The analysis of stepped wedge CRTs is usually more complex than parallel-arm CRTs due to potential secular trends that result in changing in… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  6. arXiv:2404.10629  [pdf, other

    stat.ME stat.AP

    Weighting methods for truncation by death in cluster-randomized trials

    Authors: Dane Isenberg, Michael Harhay, Nandita Mitra, Fan Li

    Abstract: Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor aver… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Code for simulations and R package is available on https://github.com/abcdane1/PtSaceCrts

  7. arXiv:2403.08927  [pdf, other

    stat.ME

    Principal stratification with U-statistics under principal ignorability

    Authors: Xinyuan Chen, Fan Li

    Abstract: Principal stratification is a popular framework for causal inference in the presence of an intermediate outcome. While the principal average treatment effects have traditionally been the default target of inference, it may not be sufficient when the interest lies in the relative favorability of one potential outcome over the other within the principal stratum. We thus introduce the principal gener… ▽ More

    Submitted 2 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  8. arXiv:2402.17096  [pdf, other

    stat.CO

    Simple rejection Monte Carlo algorithm and its application to multivariate statistical inference

    Authors: Fengyu Li, Huijiao Yu, Jun Yan, Xianyong Meng

    Abstract: The Monte Carlo algorithm is increasingly utilized, with its central step involving computer-based random sampling from stochastic models. While both Markov Chain Monte Carlo (MCMC) and Reject Monte Carlo serve as sampling methods, the latter finds fewer applications compared to the former. Hence, this paper initially provides a concise introduction to the theory of the Reject Monte Carlo algorith… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  9. arXiv:2402.15053  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Nonlinear Bayesian optimal experimental design using logarithmic Sobolev inequalities

    Authors: Fengyi Li, Ayoub Belhadji, Youssef Marzouk

    Abstract: We study the problem of selecting $k$ experiments from a larger candidate pool, where the goal is to maximize mutual information (MI) between the selected subset and the underlying parameters. Finding the exact solution is to this combinatorial optimization problem is computationally costly, not only due to the complexity of the combinatorial search but also the difficulty of evaluating MI in nonl… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  10. arXiv:2402.14840  [pdf, other

    cs.CL cs.AI stat.AP

    RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

    Authors: Congyun **, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, **jie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

    Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 13 figures

  11. arXiv:2402.02306  [pdf, other

    stat.ME stat.CO stat.ML

    A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding

    Authors: Xinyuan Chen, Liangyuan Hu, Fan Li

    Abstract: In longitudinal observational studies with a time-to-event outcome, a common objective in causal analysis is to estimate the causal survival curve under hypothetical intervention scenarios within the study cohort. The g-formula is a particularly useful tool for this analysis. To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator, which… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  12. arXiv:2401.15680  [pdf, other

    stat.ME

    How to achieve model-robust inference in stepped wedge trials with model-based methods?

    Authors: Bingkai Wang, Xueqi Wang, Fan Li

    Abstract: A stepped wedge design is a unidirectional crossover design where clusters are randomized to distinct treatment sequences. While model-based analysis of stepped wedge designs -- via linear mixed models or generalized estimating equations -- is standard practice to evaluate treatment effects accounting for clustering and adjusting for baseline covariates, their properties under misspecification hav… ▽ More

    Submitted 27 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  13. arXiv:2401.11278  [pdf, other

    stat.ME

    Handling incomplete outcomes and covariates in cluster-randomized trials: doubly-robust estimation, efficiency considerations, and sensitivity analysis

    Authors: Bingkai Wang, Fan Li, Rui Wang

    Abstract: In cluster-randomized trials (CRTs), missing data can occur in various ways, including missing values in outcomes and baseline covariates at the individual or cluster level, or completely missing information for non-participants. Among the various types of missing data in CRTs, missing outcomes have attracted the most attention. However, no existing methods can simultaneously address all aforement… ▽ More

    Submitted 24 March, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  14. arXiv:2401.04372  [pdf, ps, other

    stat.ML cs.LG math.NA stat.CO

    Stable generative modeling using diffusion maps

    Authors: Georg Gottwald, Fengyi Li, Youssef Marzouk, Sebastian Reich

    Abstract: We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the avail… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 23 pages, 25 figures

  15. arXiv:2401.01977  [pdf, other

    stat.ME

    Conformal causal inference for cluster randomized trials: model-robust inference without asymptotic approximations

    Authors: Bingkai Wang, Fan Li, Mengxin Yu

    Abstract: In the analysis of cluster randomized trials, two typical features are that individuals within a cluster are correlated and that the total number of clusters can sometimes be limited. While model-robust treatment effect estimators have been recently developed, their asymptotic theory requires the number of clusters to approach infinity, and one often has to empirically assess the applicability of… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  16. arXiv:2401.00987  [pdf, ps, other

    stat.ME math.ST stat.ML

    Inverting estimating equations for causal inference on quantiles

    Authors: Chao Cheng, Fan Li

    Abstract: The causal inference literature frequently focuses on estimating the mean of the potential outcome, whereas the quantiles of the potential outcome may carry important additional information. We propose a universal approach, based on the inverse estimating equations, to generalize a wide class of causal inference solutions from estimating the mean of the potential outcome to its quantiles. We assum… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  17. arXiv:2312.13097  [pdf, other

    stat.ME

    Power calculation for cross-sectional stepped wedge cluster randomized trials with a time-to-event endpoint

    Authors: Mary M. Ryan, Denise Esserman, Monica Taljaard, Fan Li

    Abstract: A popular design choice in public health and implementation science research, stepped wedge cluster randomized trials (SW-CRTs) are a form of randomized trial whereby clusters are progressively transitioned from control to intervention, and the timing of transition is randomized for each cluster. An important task at the design stage is to ensure that the planned trial has sufficient power to obse… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Manuscript under review; 45 pages total (main text 22 pages, supporting information 23 pages); 18 figures total (main text 4 figures, supporting information 14 figures); 2 tables total (main text 2 tables, supporting information 0 tables); 5 appendices

  18. arXiv:2311.12379  [pdf, other

    cs.LG cs.AI stat.ML

    Infinite forecast combinations based on Dirichlet process

    Authors: Yinuo Ren, Feng Li, Yanfei Kang, Jue Wang

    Abstract: Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert… ▽ More

    Submitted 24 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  19. arXiv:2311.10877  [pdf, other

    stat.ME

    Covariate adjustment in randomized experiments with missing outcomes and covariates

    Authors: Anqi Zhao, Peng Ding, Fan Li

    Abstract: Covariate adjustment can improve precision in analyzing randomized experiments. With fully observed data, regression adjustment and propensity score weighting are asymptotically equivalent in improving efficiency over unadjusted analysis. When some outcomes are missing, we consider combining these two adjustment methods with inverse probability of observation weighting for handling missing outcome… ▽ More

    Submitted 4 March, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  20. arXiv:2310.11603  [pdf, other

    stat.ME stat.OT

    Group sequential two-stage preference designs

    Authors: Ruyi Liu, Fan Li, Denise Esserman, Mary M. Ryan

    Abstract: The two-stage preference design (TSPD) enables the inference for treatment efficacy while allowing for incorporation of patient preference to treatment. It can provide unbiased estimates for selection and preference effects, where a selection effect occurs when patients who prefer one treatment respond differently than those who prefer another, and a preference effect is the difference in response… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 27 pages, 7 tables, 5 figures, 4 appendices; under review at Statistics in Medicine

    Journal ref: Statistics in Medicine. (2023) 1-27

  21. arXiv:2309.15316  [pdf, other

    stat.ME

    Leveraging Neural Networks to Profile Health Care Providers with Application to Medicare Claims

    Authors: Wenbo Wu, Fan Li, Richard Liu, Yiting Li, Mara McAdams-DeMarco, Krzysztof J. Geras, Douglas E. Schaubel, Iván Díaz

    Abstract: Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of prov… ▽ More

    Submitted 20 January, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 figures, 6 tables

  22. arXiv:2309.13677  [pdf, other

    stat.ME

    Bayesian pathway analysis over brain network mediators for survival data

    Authors: Xinyuan Tian, Fan Li, Li Shen, Denise Esserman, Yize Zhao

    Abstract: Technological advancements in noninvasive imaging facilitate the construction of whole brain interconnected networks, known as brain connectivity. Existing approaches to analyze brain connectivity frequently disaggregate the entire network into a vector of unique edges or summary measures, leading to a substantial loss of information. Motivated by the need to explore the effect mechanism among gen… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  23. arXiv:2309.07365  [pdf, other

    stat.ME

    Addressing selection bias in cluster randomized experiments via weighting

    Authors: Georgia Papadogeorgou, Bo Liu, Fan Li, Fan Li

    Abstract: In cluster randomized experiments, units are often recruited after the random cluster assignment, and data are only available for the recruited sample. Post-randomization recruitment can lead to selection bias, inducing systematic differences between the overall and the recruited populations, and between the recruited intervention and control arms. In this setting, we define causal estimands for t… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  24. arXiv:2308.07248  [pdf

    stat.ME stat.AP

    Maintaining the validity of inference from linear mixed models in stepped-wedge cluster randomized trials under misspecified random-effects structures

    Authors: Yongdong Ouyang, Monica Taljaard, Andrew B Forbes, Fan Li

    Abstract: Linear mixed models are commonly used in analyzing stepped-wedge cluster randomized trials (SW-CRTs). A key consideration for analyzing a SW-CRT is accounting for the potentially complex correlation structure, which can be achieved by specifying a random effects structure. Common random effects structures for a SW-CRT include random intercept, random cluster-by-period, and discrete-time decay. Rec… ▽ More

    Submitted 14 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  25. arXiv:2308.05451  [pdf, ps, other

    stat.ME cs.LG stat.AP

    A Forecaster's Review of Judea Pearl's Causality: Models, Reasoning and Inference, Second Edition, 2009

    Authors: Feng Li

    Abstract: With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncer… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  26. arXiv:2306.11267  [pdf, other

    stat.ME stat.AP

    Model-assisted analysis of covariance estimators for stepped wedge cluster randomized experiments

    Authors: Xinyuan Chen, Fan Li

    Abstract: Stepped wedge cluster randomized experiments (SW-CREs) represent a class of unidirectional crossover designs. Although SW-CREs have become popular, definitions of estimands and robust methods to target estimands under the potential outcomes framework remain insufficient. To address this gap, we describe a class of estimands that explicitly acknowledge the multilevel data structure in SW-CREs and h… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  27. arXiv:2306.03266  [pdf, other

    cs.LG stat.ML

    Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman

    Authors: Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen

    Abstract: Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this li… ▽ More

    Submitted 14 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  28. arXiv:2305.18412  [pdf, other

    stat.AP cs.LG

    Short-term Temporal Dependency Detection under Heterogeneous Event Dynamic with Hawkes Processes

    Authors: Yu Chen, Fengpei Li, Anderson Schneider, Yuriy Nevmyvaka, Asohan Amarasingham, Henry Lam

    Abstract: Many event sequence data exhibit mutually exciting or inhibiting patterns. Reliable detection of such temporal dependency is crucial for scientific investigation. The de facto model is the Multivariate Hawkes Process (MHP), whose impact function naturally encodes a causal structure in Granger causality. However, the vast majority of existing methods use direct or nonlinear transform of standard MH… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Conference on Uncertainty in Artificial Intelligence 2023

  29. arXiv:2305.13443  [pdf, other

    stat.ME stat.AP

    Multiply robust estimation for causal survival analysis with treatment noncompliance

    Authors: Chao Cheng, Yueqi Guo, Bo Liu, Lisa Wruck, Fan Li, Fan Li

    Abstract: Comparative effectiveness research frequently addresses a time-to-event outcome and can require unique considerations in the presence of treatment noncompliance. Motivated by the challenges in addressing noncompliance in the ADAPTABLE pragmatic trial, we develop a multiply robust estimator to estimate the principal survival causal effects under the principal ignorability and monotonicity assumptio… ▽ More

    Submitted 27 July, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  30. arXiv:2304.10025  [pdf, other

    stat.ME stat.ML

    Identification and multiply robust estimation in causal mediation analysis across principal strata

    Authors: Chao Cheng, Fan Li

    Abstract: We consider assessing causal mediation in the presence of a post-treatment event (examples include noncompliance, a clinical event, or a terminal event). We identify natural mediation effects for the entire study population and for each principal stratum characterized by the joint potential values of the post-treatment event. We derive efficient influence functions for each mediation estimand, whi… ▽ More

    Submitted 25 March, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

  31. arXiv:2304.04868  [pdf, other

    stat.ME stat.AP

    Correcting for bias due to mismeasured exposure in mediation analysis with a survival outcome

    Authors: Chao Cheng, Donna Spiegelman, Fan Li

    Abstract: Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. Motivated by the Health Professionals Follow-up Study (HPFS), we investigate the impact of exposure measurement error on assessing medi… ▽ More

    Submitted 15 September, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  32. arXiv:2304.03928  [pdf

    cs.LG stat.AP

    Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation

    Authors: Hengjie Yu, Dan Luo, Sam F. Y. Li, Maozhen Qu, Da Liu, Yingchao He, Fang Cheng

    Abstract: Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI)… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 30 pages, 6 figures

  33. arXiv:2304.02740  [pdf, other

    stat.CO stat.ME

    PStrata: An R Package for Principal Stratification

    Authors: Bo Liu, Fan Li

    Abstract: Post-treatment confounding is a common problem in causal inference, including special cases of noncompliance, truncation by death, surrogate endpoint, etc. Principal stratification (Frangakis and Rubin 2002) is a general framework for defining and estimating causal effects in the presence of post-treatment confounding. A prominent special case is the instrumental variable approach to noncompliance… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  34. arXiv:2304.01506  [pdf, other

    cs.LG cs.AI cs.DB stat.ML

    OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting

    Authors: Xiao He, Ye Li, Jian Tan, Bin Wu, Feifei Li

    Abstract: Seasonal-trend decomposition is one of the most fundamental concepts in time series analysis that supports various downstream tasks, including time series anomaly detection and forecasting. However, existing decomposition methods rely on batch processing with a time complexity of O(W), where W is the number of data points within a time window. Therefore, they cannot always efficiently support real… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: PVLDB 2023

    Report number: 1399-1412

  35. arXiv:2304.00231  [pdf

    stat.ME

    Using Overlap Weights to Address Extreme Propensity Scores in Estimating Restricted Mean Counterfactual Survival Times

    Authors: Zhiqiang Cao, Lama Ghazi, Claudia Mastrogiacomo, Laura Forastiere, F. Perry Wilson, Fan Li

    Abstract: While the inverse probability of treatment weighting (IPTW) is a commonly used approach for treatment comparisons in observational data, the resulting estimates may be subject to bias and excessively large variance when there is lack of overlap in the propensity score distributions. By smoothly down-weighting the units with extreme propensity scores, overlap weighting (OW) can help mitigate the bi… ▽ More

    Submitted 10 February, 2024; v1 submitted 1 April, 2023; originally announced April 2023.

  36. arXiv:2304.00200  [pdf, ps, other

    stat.ML cs.LG stat.CO

    Diffusion map particle systems for generative modeling

    Authors: Fengyi Li, Youssef Marzouk

    Abstract: We propose a novel diffusion map particle system (DMPS) for generative modeling, based on diffusion maps and Laplacian-adjusted Wasserstein gradient descent (LAWGD). Diffusion maps are used to approximate the generator of the corresponding Langevin diffusion process from samples, and hence to learn the underlying data-generating manifold. On the other hand, LAWGD enables efficient sampling from th… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

  37. arXiv:2303.13960  [pdf

    stat.ME

    Demystifying estimands in cluster-randomised trials

    Authors: Brennan C Kahan, Bryan Blette, Michael Harhay, Scott Halpern, Vipul Jairath, Andrew Copas, Fan Li

    Abstract: Estimands can help clarify the interpretation of treatment effects and ensure that estimators are aligned to the study's objectives. Cluster randomised trials require additional attributes to be defined within the estimand compared to individually randomised trials, including whether treatment effects are marginal or cluster specific, and whether they are participant or cluster average. In this pa… ▽ More

    Submitted 22 February, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

  38. Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity

    Authors: Mary M. Ryan, Denise Esserman, Fan Li

    Abstract: Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster level but outcomes are typically collected at the individual level. When CRTs are employed in pragmatic settings, baseline population characteristics may moderate treatment effects, leading to what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs can en… ▽ More

    Submitted 30 May, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 25 pages, 6 figures, 5 tables, 3 appendices; clarified phrasing, typos corrected

    Journal ref: Statistics in Medicine. (2023) 1-22

  39. arXiv:2301.07672  [pdf, other

    stat.ME stat.AP

    Principal Stratification with Time-to-Event Outcomes

    Authors: Bo Liu, Lisa Wruck, Fan Li

    Abstract: Post-randomization events, also known as intercurrent events, such as treatment noncompliance and censoring due to a terminal event, are common in clinical trials. Principal stratification is a framework for causal inference in the presence of intercurrent events. Despite the extensive existing literature, there lacks generally applicable and accessible methods for principal stratification analysi… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

  40. arXiv:2212.13892  [pdf, other

    cs.IR cs.LG stat.ME

    Cross-Dataset Propensity Estimation for Debiasing Recommender Systems

    Authors: Fengyu Li, Sarah Dean

    Abstract: Datasets for training recommender systems are often subject to distribution shift induced by users' and recommenders' selection biases. In this paper, we study the impact of selection bias on datasets with different quantization. We then leverage two differently quantized datasets from different source distributions to mitigate distribution shift by applying the inverse probability scoring method… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: In Workshop on Distribution Shifts, 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  41. Model-robust and efficient covariate adjustment for cluster-randomized experiments

    Authors: Bingkai Wang, Chan Park, Dylan S. Small, Fan Li

    Abstract: Cluster-randomized experiments are increasingly used to evaluate interventions in routine practice conditions, and researchers often adopt model-based methods with covariate adjustment in the statistical analyses. However, the validity of model-based covariate adjustment is unclear when the working models are misspecified, leading to ambiguity of estimands and risk of bias. In this article, we fir… ▽ More

    Submitted 18 July, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

  42. arXiv:2210.04100  [pdf, other

    stat.ME

    Doubly robust estimation and sensitivity analysis for marginal structural quantile models

    Authors: Chao Cheng, Liangyuan Hu, Fan Li

    Abstract: The marginal structure quantile model (MSQM) provides a unique lens to understand the causal effect of a time-varying treatment on the full distribution of potential outcomes. Under the semiparametric framework, we derive the efficiency influence function for the MSQM, from which a new doubly robust estimator is proposed for point estimation and inference. We show that the doubly robust estimator… ▽ More

    Submitted 10 February, 2024; v1 submitted 8 October, 2022; originally announced October 2022.

  43. arXiv:2209.03533  [pdf, other

    stat.ME stat.AP

    Using propensity scores for racial disparities analysis

    Authors: Fan Li, Fan Li

    Abstract: Propensity score plays a central role in causal inference, but its use is not limited to causal comparisons. As a covariate balancing tool, propensity score can be used for controlled descriptive comparisons between groups whose memberships are not manipulable. A prominent example is racial disparities in health care. However, conceptual confusion and hesitation persists for using propensity score… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: This is an invited commentary. This commentary includes 10 pages, 2 figures and 1 table

  44. arXiv:2209.01297  [pdf, other

    stat.ME

    Assessing treatment effect heterogeneity in the presence of missing effect modifier data in cluster-randomized trials

    Authors: Bryan S. Blette, Scott D. Halpern, Fan Li, Michael O. Harhay

    Abstract: Understanding whether and how treatment effects vary across subgroups is crucial to inform clinical practice and recommendations. Accordingly, the assessment of heterogeneous treatment effects (HTE) based on pre-specified potential effect modifiers has become a common goal in modern randomized trials. However, when one or more potential effect modifiers are missing, complete-case analysis may lead… ▽ More

    Submitted 1 December, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

  45. arXiv:2209.00170  [pdf, other

    cs.CR cs.DC cs.LG stat.ML

    CPS Attack Detection under Limited Local Information in Cyber Security: A Multi-node Multi-class Classification Ensemble Approach

    Authors: Junyi Liu, Yifu Tang, Haimeng Zhao, Xieheng Wang, Fangyu Li, **gyi Zhang

    Abstract: Cybersecurity breaches are the common anomalies for distributed cyber-physical systems (CPS). However, the cyber security breach classification is still a difficult problem, even using cutting-edge artificial intelligence (AI) approaches. In this paper, we study the multi-class classification problem in cyber security for attack detection. A challenging multi-node data-censoring case is considered… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: 22 pages. Submitted to ACM Transactions on Sensor Networks (TOSN)

  46. arXiv:2208.00139  [pdf, other

    stat.ME stat.AP stat.CO

    Another look at forecast trimming for combinations: robustness, accuracy and diversity

    Authors: Xiaoqian Wang, Yanfei Kang, Feng Li

    Abstract: Forecast combination is widely recognized as a preferred strategy over forecast selection due to its ability to mitigate the uncertainty associated with identifying a single "best" forecast. Nonetheless, sophisticated combinations are often empirically dominated by simple averaging, which is commonly attributed to the weight estimation error. The issue becomes more problematic when dealing with a… ▽ More

    Submitted 14 June, 2024; v1 submitted 30 July, 2022; originally announced August 2022.

  47. Covariate Adjustment in Randomized Clinical Trials with Missing Covariate and Outcome Data

    Authors: Chia-Rui Chang, Yue Song, Fan Li, Rui Wang

    Abstract: When analyzing data from randomized clinical trials, covariate adjustment can be used to account for chance imbalance in baseline covariates and to increase precision of the treatment effect estimate. A practical barrier to covariate adjustment is the presence of missing data. In this paper, in the light of recent theoretical advancement, we first review several covariate adjustment methods with i… ▽ More

    Submitted 16 May, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

  48. arXiv:2206.15460  [pdf, other

    stat.ME stat.AP

    Bayesian Causal Inference: A Critical Review

    Authors: Fan Li, Peng Ding, Fabrizia Mealli

    Abstract: This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework. We review the causal estimands, identification assumptions, the general structure of Bayesian inference of causal effects, and sensitivity analysis. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, definition o… ▽ More

    Submitted 23 October, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

  49. arXiv:2206.11978  [pdf, other

    stat.ME

    Power analyses for stepped wedge designs with multivariate continuous outcomes

    Authors: Kendra Davis-Plourde, Monica Taljaard, Fan Li

    Abstract: Multivariate outcomes are common in pragmatic cluster randomized trials. While sample size calculation procedures for multivariate outcomes exist under parallel assignment, none have been developed for a stepped wedge design. In this article, we present computationally efficient power and sample size procedures for stepped wedge cluster randomized trials (SW-CRTs) with multivariate outcomes that d… ▽ More

    Submitted 2 December, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

  50. arXiv:2206.11343  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci stat.AP stat.CO stat.ML

    Bayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transport

    Authors: Ricardo Baptista, Lianghao Cao, Joshua Chen, Omar Ghattas, Fengyi Li, Youssef M. Marzouk, J. Tinsley Oden

    Abstract: We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for h… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.